Tag Archives: audio

Encoding And Muxing FATE

One weak spot in FATE‘s architecture has to do with encoding and muxing formats. So far, I have been content to allow the master regression suite handle the encode/mux tests for the most important formats that FFmpeg supports. But the suite doesn’t handle everything. Plus, I still have the goal of eventually breaking up all of the regression suite’s functionality into individual FATE test specs.

At first, the brainstorm was to encode things directly to stdout so that nothing ever really has to be written to disk. The biggest problem with this approach is that stdout is non-seekable. For formats that require seeking backwards, this is a non-starter (e.g., a QuickTime muxer will always have to seek back to the start of the file and fill in the total size of the mdat atom that was just laid down on disk).

So it’s clear that an encode/mux test needs to commit bytes to some seekable media. Where is it okay to write to disk? I think $BUILD_PATH should be okay, since the build step already writes data there.

The natural flow of the master regression suite is to encode/mux a test stream, run a hash of the resulting stream, then demux/decode the encoded stream and run a hash on the result. In FATE, I imagine these 2 operations would be split into 2 separate tests. But how can I guarantee that one will run before the other? Right now, there’s no official guarantee of the order in which the test specs run. But I can — and plan to — change the policy via fate-script.py so that the tests are always run in order of their database IDs. That way, I can always guarantee that the encode/mux test is run and the correct sample file is waiting on disk before the corresponding demux/decode test executes.

I also need a way to verify the contents of the encoded file on disk. I think this can be handled via a new special directive along the lines of:

{MD5FILE,$BUILD_PATH/output} $BUILD_PATH/ffmpeg -i $SAMPLES_PATH/input/input -f format -y $BUILD_PATH/output

This will read the bytes of the file ‘output’ and compute the MD5 hash. This seems simple enough in a local environment. But it is another item that may pose challenges in the cross-FATE architecture I am working on with Mans which will support automated testing on less powerful/differently-targeted platforms.

FFmpeg Perceptual Audio Test Plan

There have been some problems with FATE audio testing. First off, the qt-ima4-stereo test spec was testing against the wrong file for the past year. Stereo IMA ADPCM decoding could have broken in QuickTime and we might have never been alerted. Sloppy.

More seriously, I found out that many of my existing, bitexact audio tests have not been constructed properly. This is due to the fact that these 2 commands:

ffmpeg -i file.ext file.wav
ffmpeg -i file.ext -f wav - > file.wav

do not yield equivalent sets of bytes inside file.wav. Part of the reason is that, after writing out all the audio samples, the muxer needs to rewind to the header so that it can write the data payload length. When writing data to stdout, the program does not have the option to rewind the output stream. However, I don’t understand the entire discrepancy. Using the file qt-ima4-mono with the above command lines:

1156652 surge.wav
1146924 surge-stdout.wav

The file that is routed through stdout is notably smaller (9728 bytes smaller). I was going to write this off as the stdout file failing to be flushed. However, the behavior is consistent across all machines and platforms.

My proposed solution is to update all of the audio tests to use this raw format target:

ffmpeg -i file.ext -f s16le -

Since the output is equivalent to:

ffmpeg -i file.ext -f s16le file.s16le

1156608 surge.s16le
1156608 surge-stdout.s16le

Moving right along, there is the much bigger task of testing perceptual audio decoders. Working down the FATE Test Coverage list, these perceptual audio codecs will get the naive, one-off wave reference treatment in lieu of a proper conformance suite: ATRAC3, RealAudio Cooker, DCA (DTS), IMC, Nellymoser, Qcelp, QDesign, RealAudio 28.8, Truespeech, Vorbis, and WMA v1.

Then there is the matter of MPEG audio codecs for which we have access to extensive conformance suites. Thanks to Kostya and Benjamin for furnishing pointers to precise information discussing how to verify if your MP1/2/3 or AAC audio decoder is up to snuff. This page at Underbit describes exactly how the spec describes conformance for MPEG 1, layers 1, 2, and 3, and also evaluates the conformance of various implementations. The comparison ostensibly predates FFmpeg. This Mp4-tech mailing list post shows the way regarding AAC conformance.

So I need to automate the MP1/2/3 and AAC test entries. I estimate the automated process will work something like this:

  • Decode encoded file
  • Run comparison of decoded wave against original wave
    • For MP1/2/3, this seems to entail converting both the FFmpeg output and the original wave output floating point numbers to a normalized range of -1.0..1.0, computing the root mean square of the difference signal, and verifying that the RMS is less than 1 / (32768 * sqrt(12))
    • For AAC, well, I’m still researching the precise criteria
  • If the decoded wave is within tolerance, add a new test

The part where I get a bit fuzzy is: what should the test spec be? Should I generate a reference wave and test future decoded waves against it using my one-off wave reference method? Or, should I just go ahead and compute the RMS of the difference signal? I assume that if I use the nifty numpy library for the task, it couldn’t possibly make any measurable difference in the performance of FATE testing vs. using the one-off wave reference method (computing absolute value of the difference signal and checking that no discrete points exceed 1).

One trade-off is that I would need to store the full 24-bit reference waves in order to properly compute RMS, which is 50% more data than I would need with the one-off method. And I’m still not sure how to process the 24-bit data in any event.

Lossless Audio Tests

I feel a little more confident about the lossless audio decoders in FFmpeg — some of them, at least — so I activated the FATE tests that I had staged for them:

Further, I have added some tests related to Sierra game formats:

See Also:

Lossless Audio Anomalies

Some years ago, I took a 1-minute sample of a song from a CD and compressed it using every lossless audio coder I knew of at the time — a dozen of them in all. I put the samples here. This effort predated FATE by a number of years. Nowadays, FFmpeg contains native decoding support for 7 of those lossless audio algorithms. And since lossless decoders, by definition, are supposed to have bitexact results, this should be an easy task to add automated tests to FATE.

My pragmatic rule for FATE samples is to try to keep the samples under 2 MB where feasible. Since most of these lossless samples I created weigh in between 6-7 MB, I set about slicing off the first megabyte of each supported sample. And that’s when I noticed that the output was not bitexact across configurations, at least not for all the algorithms. This struck me as odd.

The following lossless algorithms produced identical output across platforms, even with an incomplete final chunk:

Apple Lossless generates 4 results — all Linux/x86_32 agreed, all Linux/x86_64 agreed, all Linux/PPC agreed, and Mac OS X x86_64 and PPC agreed.

True Audio Lossless generates 19 different results — all configurations disagreed except for Mac OS X x86_64 and PPC, which agreed with each other.

Digging deeper using ‘-f framecrc’ demonstrates that all frames agree across configurations until the last, incomplete frame (and, of course, the complete files are bitexact). No big emergency; I just thought it was interesting. I will be able to contrive a new, smaller ALAC sample using iTunes (or FFmpeg using Jai’s encoder), and it looks like True Audio’s encoder is still around as well, and available for Linux to boot.