Tag Archives: aac

The Standard, Like It Or Not

I have been studying multimedia technology since 2000. It has been a pretty chaotic technological landscape. People who wanted to publish video on the web wondered what format to use (and occasionally sought my advice). Various fiefdoms arose around Microsoft, Apple, and Real, all hoping to claim the mantle of the standard web video format. Somewhere along the line, Macromedia “accidentally” established a standard web video format via the Flash Player (now Adobe’s).

A few years ago, Adobe (my employer, BTW) upgraded the video support in Flash Player to use the same video format that happened to sit at the top of QuickTime’s codec heap: QT-MP4/H.264/AAC. A few days ago, Microsoft announced the beta of Silverlight 3, which contains support for the same formats. After absorbing that information, it took a few days for the next thought to coalesce in my mind:

We have a standard multimedia format.

All the big players support the same multimedia stack (I think even Real Player supports the same stack). I know that’s dismaying to certain elements of the free software community who insist that Xiph’s multimedia stack is the “standard” (really! there are blessed RFCs to back it up and everything); you may not like it, but that’s the way it is:

  • QT-MP4 is the standard container format, not Ogg
  • H.264 is the standard video codec, not Theora (or dirac)
  • AAC (and also MP3, for historical purposes) is the standard audio codec, not Vorbis

Sure, you may, in principal, have to send a dollar or 2 over to the Patent Illuminati (though highly unlikely). But it’s either that, or, you know, not have a standard video format. (And remember, the HTML5 video tag is not coming to save you.)

At least the free software enthusiast can take comfort in knowing that open source (L/GPL) efforts such as FFmpeg and x264 aim to create the very best tools that anyone can possibly use to create these formats.

Addendum: Now that I think about it, I don’t necessarily know if Silverlight 3 will transport H.264 and AAC inside of a QT-MP4 container or somehow pack it into an ASF file. That would be interesting to find out, though I have read (possibly uninformed) blog chatter excited about being able to stream the same file through Flash and Silverlight.

FFmpeg Perceptual Audio Test Plan

There have been some problems with FATE audio testing. First off, the qt-ima4-stereo test spec was testing against the wrong file for the past year. Stereo IMA ADPCM decoding could have broken in QuickTime and we might have never been alerted. Sloppy.

More seriously, I found out that many of my existing, bitexact audio tests have not been constructed properly. This is due to the fact that these 2 commands:

ffmpeg -i file.ext file.wav
ffmpeg -i file.ext -f wav - > file.wav

do not yield equivalent sets of bytes inside file.wav. Part of the reason is that, after writing out all the audio samples, the muxer needs to rewind to the header so that it can write the data payload length. When writing data to stdout, the program does not have the option to rewind the output stream. However, I don’t understand the entire discrepancy. Using the file qt-ima4-mono with the above command lines:

1156652 surge.wav
1146924 surge-stdout.wav

The file that is routed through stdout is notably smaller (9728 bytes smaller). I was going to write this off as the stdout file failing to be flushed. However, the behavior is consistent across all machines and platforms.

My proposed solution is to update all of the audio tests to use this raw format target:

ffmpeg -i file.ext -f s16le -

Since the output is equivalent to:

ffmpeg -i file.ext -f s16le file.s16le

1156608 surge.s16le
1156608 surge-stdout.s16le

Moving right along, there is the much bigger task of testing perceptual audio decoders. Working down the FATE Test Coverage list, these perceptual audio codecs will get the naive, one-off wave reference treatment in lieu of a proper conformance suite: ATRAC3, RealAudio Cooker, DCA (DTS), IMC, Nellymoser, Qcelp, QDesign, RealAudio 28.8, Truespeech, Vorbis, and WMA v1.

Then there is the matter of MPEG audio codecs for which we have access to extensive conformance suites. Thanks to Kostya and Benjamin for furnishing pointers to precise information discussing how to verify if your MP1/2/3 or AAC audio decoder is up to snuff. This page at Underbit describes exactly how the spec describes conformance for MPEG 1, layers 1, 2, and 3, and also evaluates the conformance of various implementations. The comparison ostensibly predates FFmpeg. This Mp4-tech mailing list post shows the way regarding AAC conformance.

So I need to automate the MP1/2/3 and AAC test entries. I estimate the automated process will work something like this:

  • Decode encoded file
  • Run comparison of decoded wave against original wave
    • For MP1/2/3, this seems to entail converting both the FFmpeg output and the original wave output floating point numbers to a normalized range of -1.0..1.0, computing the root mean square of the difference signal, and verifying that the RMS is less than 1 / (32768 * sqrt(12))
    • For AAC, well, I’m still researching the precise criteria
  • If the decoded wave is within tolerance, add a new test

The part where I get a bit fuzzy is: what should the test spec be? Should I generate a reference wave and test future decoded waves against it using my one-off wave reference method? Or, should I just go ahead and compute the RMS of the difference signal? I assume that if I use the nifty numpy library for the task, it couldn’t possibly make any measurable difference in the performance of FATE testing vs. using the one-off wave reference method (computing absolute value of the difference signal and checking that no discrete points exceed 1).

One trade-off is that I would need to store the full 24-bit reference waves in order to properly compute RMS, which is 50% more data than I would need with the one-off method. And I’m still not sure how to process the 24-bit data in any event.