Monthly Archives: January 2009

Mainstreaming Multimedia Terms

I was catching up on about 3 months worth of QuickTime movie trailers when I viewed the trailer for a movie called Miss March. The part that stood out for me is that the main characters meet a rapper whom they address as “Dot MPEG”. The IMDb page for the movie actually lists the rapper’s name as — ahem — Horsedick.MPEG.


Craig Robinson portrays rapper "Horsedick.MPEG" in the movie Miss March

No big meaning to this post; I just thought it would be interesting to the multimedia nerd readership of this blog.

Lossless Audio Anomalies

Some years ago, I took a 1-minute sample of a song from a CD and compressed it using every lossless audio coder I knew of at the time — a dozen of them in all. I put the samples here. This effort predated FATE by a number of years. Nowadays, FFmpeg contains native decoding support for 7 of those lossless audio algorithms. And since lossless decoders, by definition, are supposed to have bitexact results, this should be an easy task to add automated tests to FATE.

My pragmatic rule for FATE samples is to try to keep the samples under 2 MB where feasible. Since most of these lossless samples I created weigh in between 6-7 MB, I set about slicing off the first megabyte of each supported sample. And that’s when I noticed that the output was not bitexact across configurations, at least not for all the algorithms. This struck me as odd.

The following lossless algorithms produced identical output across platforms, even with an incomplete final chunk:

Apple Lossless generates 4 results — all Linux/x86_32 agreed, all Linux/x86_64 agreed, all Linux/PPC agreed, and Mac OS X x86_64 and PPC agreed.

True Audio Lossless generates 19 different results — all configurations disagreed except for Mac OS X x86_64 and PPC, which agreed with each other.

Digging deeper using ‘-f framecrc’ demonstrates that all frames agree across configurations until the last, incomplete frame (and, of course, the complete files are bitexact). No big emergency; I just thought it was interesting. I will be able to contrive a new, smaller ALAC sample using iTunes (or FFmpeg using Jai’s encoder), and it looks like True Audio’s encoder is still around as well, and available for Linux to boot.

Science Into Engineering

I modified my distributed RPC test staging utility to implement my imprecise audio testing idea. This is the output under typical conditions:

There was 1 unique stdout blob collected
all successful configurations agreed on this stdout blob:
pass

So, it worked. Yeah, I’m surprised too. That result means that all the configurations (20 total) produce an audio waveform in which no individual PCM sample deviates from the reference wave by more than 1. Since I had to choose some configuration to generate the reference sample, I used Linux / x86_32 / gcc 2.95.3.

BTW, this is the general Python algorithm I am using to compare the waves. It takes a full minute, give or take a second, to compare 2 33MB samples:

I replaced abs() with a branch to check if the diff is < -1 or > 1, but that didn’t improve speed measurably. I think the constant unpacking might have something to do with it. Better solutions welcome. (By comparison, performing a comparison using ‘cmp’ of 2 identical files that have the same size as the test above, living on a network share, takes less than 2 seconds.)

For a 10-second sample of a .m4a stereo AAC file (882,000 samples), these are the number of PCM samples that deviated by 1 (first number), and by more than 1 (second number). You will notice that no samples deviated by more than 1, which was my hypothesis at the start, and the basis on which I devised this plan:

Mac OS X / PPC / gcc 4.0.1
432691, 0

Linux / x86_32 / icc
238, 0

Linux / x86_32 / gcc 2.95.3
0, 0

Linux / PPC / gcc 4.0.4
Linux / PPC / gcc 4.1.2
Linux / PPC / gcc 4.2.4
Linux / PPC / gcc 4.3.2
Linux / PPC / gcc svn
432701, 0

Linux / x86_64 / gcc 4.0.4
Linux / x86_64 / gcc 4.1.2
Linux / x86_64 / gcc 4.2.4
Linux / x86_64 / gcc 4.3.2
Linux / x86_64 / gcc svn
248, 0

Linux / x86_32 / gcc 3.4.6
Linux / x86_32 / gcc 4.0.4
Linux / x86_32 / gcc 4.1.2
Linux / x86_32 / gcc 4.2.4
Linux / x86_32 / gcc 4.3.2
Linux / x86_32 / gcc svn
237, 0

Mac OS X / x86_64 / gcc 4.0.1
244, 0

I have thrown RealAudio Cooker and 28.8 samples at this, and both work. I am still testing this against some more audio samples to ensure that this idea holds water.

Improving The Science

Apparently, Aurel thought my proposed imprecise audio testing method for FATE had some merit, even if everyone else thinks I’m crazy. Aurel proposed and prototyped a new PCM encoder called pcm_s16le_trunc. This normalizes a sequence of signed, 16-bit PCM samples according to the formula

sample[n] = (sample[n] + 1) & ~0x03

So it sort of smooths out the rough edges of a PCM wave in the hopes of making it possible to compare to reference waves in a bit-exact manner. Unfortunately, deploying the method on my distributed RPC testing system yields no success. For example, testing a particular AAC file across 18 different FFmpeg configurations results in 7 unique sets of output, which group the same way as without ‘-f pcm_s16le_trunc’, though with differing output.

I still think it’s an interesting idea. Any other refinements to throw out there?

To reiterate on a comment I left on the last post, I do not wish to completely supplant the official validation methods that are already in place for various audio coding standards, most notably the MPEG audio standards. I envision 2 stages for these audio tests. Really, this parallels what I described when I discussed how I enter tests into FATE in the first place– the first stage is to validate that a given sample produces the correct results. For many test specs in FATE, this involves manually consuming the contents of a media sample with my own eyes and ears to judge if it looks and sounds minimally sane. The first stage for these audio tests will be in the same spirit, only that there will be more mathematically rigorous qualifications for getting into FATE. Beyond just “sounding okay”, files will have to meet certain PSNR thresholds. The second stage is to create reference waves that the audio decoders can be continuously tested against.

What thresholds need to be met, and what files should be used? Well, that varies depending on codec:

  • MP2/MP3/AAC: Plenty of official conformance samples available, and I’m pretty sure I have been sitting idly on most of them for many years. I need to do a little research to determine how close the decoder needs to get the final output.
  • Vorbis: Savior of open source multimedia (at least in the audio domain), while the format is exhaustively documented, I don’t see any specifications regarding quality thresholds that encoders and decoders need to meet, nor is there any conformance suite that I know of. This is particularly troubling since there have been numerous revisions to the format over the years and older variations are undoubtedly still “in the wild”. It would be nice to assemble a sample collection that represents various generations of the format.

    In any case, it will be reasonable to generate our own samples for testing this format.

  • RealAudio codecs such as Cooker and 28_8; also AC3, ATRAC3, QDesign, QCELP, assorted others: I’m not especially motivated to find the software that creates these formats for contriving my own encodings so that I know the source material. There are no known conformance suites, and there is no way that we can know intended, reasonable thresholds. I think we’ll just have to take the “sounds good enough; make new reference wave” approach for these.

One last thought: one prospect I appreciate about this reference wave testing idea is that I wouldn’t have to update the test specs in the database if the output has changed significantly– just update the reference waves (as long as they pass whatever thresholds we put in place).

See Also: