Monthly Archives: February 2009

Towards The Next FFmpeg Release

The FFmpeg team is still very much committed to making a formal release, and soon. Originally, the release was slated for this weekend. Some problems with the bug database made it difficult to host a major bug-fixing initiative as planned last weekend. So the current plan is to go on a binge bug-fix this weekend and hopefully release next weekend. The release has waited this long, so what’s one more week?

Meanwhile, things are going great with automated testing. Thanks to much discussion and determination from quite a few people, the entire regression suite passes, at long last, on more configurations than ever before, giving FATE a more solid baseline for continuous testing. Most notably, the regressions pass on 32- and 64-bit Mac OS X, 32-bit icc (Intel’s C compiler), and PowerPC/Linux when using gcc 4.0, 4.1, or 4.2. 4.3 still presents a problem, while the SVN versions of gcc for the PowerPC have been messed up for months. I’m really not sure what to do about that. Further, I see that gcc on PowerPC 64 suffers from a colorful variety of random problems (sometimes the compiler even comes up with an internal error, and we’re not even talking about the SVN versions of gcc here).

Still, things are looking up. Also, according to my tally on the FATE test coverage page, FFmpeg supports 501 muxers, demuxers, encoders, and decoders. I don’t have to tell you that nothing else comes close.

Silverlight Codecpack

I was visiting ossguy’s blog today when I noticed that he took a small break from the usual “Free Software Über Alles” rhetoric to post a useful investigation of the Microsoft binary codec pack that corresponds to Moonlight, Linux’s free implementation of Silverlight. At first, I was surprised to hear that this codec pack was finally available– I didn’t think it was going to be generally available until Moonlight’s official release. A little digging revealed that Moonlight 1.0 was officially released yesterday. I wondered why I hadn’t seen anything about this on any major Linux news sites yet.

Apparently, no one cares.

Well, I care, insofar as this is another way to study some codecs. I think it’s really slick that the codec pack is one monolithic, relatively small, binary blob that contains all the proprietary codecs needed to support Silverlight. ossguy’s post details 2 more-or-less direct download links:

I see that compn is already on the case, identifying the precise codec formats that this blob is designed to handle: wma1, wma2, wma3, wmv1, wmv2, wmv3, wmav, wvc1, and mp3. So, really, nothing interesting for our cause. Almost all of those formats are already supported in FFmpeg. The one that isn’t — WMA3 — is in progress via a Summer of Code project. Who knows? Perhaps this codec pack could yield some new intelligence. But I tend to think the previous binary decoder released for Linux — packaged with Linspire — was pretty thorough in its presentation of symbols.

This codec pack is pretty thorough in the symbols department as well. Run ‘strings’ against the blob to see the ASCII strings. Filter the output through ‘c++filt’ as this will demangle (official technical jargon) the C++-style names:

  strings | c++filt

And if you want to disassemble the binary, here is a little something I wrote up regarding the bare essentials of ‘objdump’ as applied to reverse engineering work.

At the very least, I can see this codec pack being useful for hooking up to FFmpeg in order to gather a baseline for profiling to see how FFmpeg’s decoders stack up. Further, I should be able to use it to decode reference samples to verify how close FFmpeg decodes, e.g., WMA 1/2 data to the original.

Whatever the case, I have started a MultimediaWiki page to describe the API.

Numerical Gymnastics Redux

Remember in my last post when I described that the reference encodings in the MPEG-1 audio conformance suite were stored as a list of 32-bit hex numbers in ASCII format? I just thought I would mention that that was only for the layer I encodings. The layer II encodings, for whatever reason, only have 1 byte per line in ASCII format. The layer III encodings are in a proper binary form, however.


Now that I am confident that the root mean square (RMS) tests pass, I need to decide how to store the samples and in which numerical precision and format the RMS will be computed. At first I reasoned that, since the 24-bit integer, 32-bit integer, and 32-bit float precisions all yielded passing results, any should work. However, before I got enough precision in the FFmpeg output, the 32-bit float precision failed where the other 2 still succeeded. This leads me to believe that the 32-bit float space would be the best precision to work with.

However, some tests reveal that either I’m doing something wrong, or FFmpeg has a bug in which it flips sign on individual samples when converting to a floating point format. My money is on the former (i.e., my mistake). However, I then realized that there is really no reason to ask FFmpeg to output floating point data from its various MPEG-1 audio decoders since they are all decoding to integers anyway. However, I do need to perform some configuration rework in order to compile FFmpeg in such a way that it will output 32-bit precision integers via configuration option vs. manual hacking.

So my proposed testing process for the MPEG-1 audio conformance vectors is the following:

  • patch FFmpeg to allow for a –enable-audio-long configure option that will allow audio decoders to output higher precision audio (only applies to MPEG-1 decoders right now)
  • convert all of the encoded samples to proper binary files
  • convert all of the conformance vectors to s32le raw format; while this is 33% more data than is strictly necessary, I think it will be easier to process chunks of 32-bit data vs. 24-bit
  • stage the encoded samples and reference waves in the formal FATE suite
  • modify to honor a new command in the form of {RMS,$SAMPLES_PATH/wave-n-ref.s32le,37837.0} $BUILD_PATH/ffmpeg -i $SAMPLES_PATH/wave-n.mpg -f s32le -; the first parameter of the RMS special directive is the file of raw, 32-bit, signed, little endian data against which the command output must be compared, while the second parameter is the RMS threshold not to be exceeded (in this case, 232-15 / sqrt(12) = 37837, see last entry for explanation)
  • enter new FATE test specs

Now that I write it all out, however, I realize that it is not strictly necessary to get FFmpeg modified to output higher precision numbers since the 16-bit numbers, scaled up, will pass the 24- or 32-bit thresholds, per my empirical findings. This makes me wonder if I should store and read the data as 32-bit integers (and enable high precision from FFmpeg), but then convert the numbers to floating point for the RMS calculation. The performance impact would be negligible (getting all the numbers lined up in arrays still takes longer than doing floating point ops on them), and the test would be stricter and conceivably catch more problems. Then again, it may have been a math error on my part that caused the floating point test to fail while the 24- and 32-bit tests worked.

One more stipulation I (may) need to make in the final test: The reference wave always has considerably more samples (e.g., 65536) than FFmpeg decodes (e.g., 37632). I have been performing RMS along the length of the shorter wave and the test has been meeting threshold. I still don’t know if this is a discrepancy to worry about, but at the very least, I think I should add a provision in the ad-hoc {RMS} method that the decoded wave has to be at least half as long as the reference wave.

Gymnastics Routine

Pommel horse

You would not believe how many numerical gymnastics I have to perform in order to test these MPEG-1 audio conformance vectors. It seems straightforward enough– a conformance vector, at least for layers 1 and 2, consists of a .MPG file and a .PCM file. The MPG file is supposed to contain an encoded MPEG audio stream while the PCM file has the output after the corresponding MPG file has been run through the official reference decoder. The root mean square (RMS) of the difference between that reference PCM file and, say, the output of the FFmpeg decoder needs to be less than 1 / (32768 * sqrt(12)). So what’s the big deal?

Continue reading