Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


Gymnastics Routine

February 9th, 2009 by Multimedia Mike

Pommel horse

You would not believe how many numerical gymnastics I have to perform in order to test these MPEG-1 audio conformance vectors. It seems straightforward enough– a conformance vector, at least for layers 1 and 2, consists of a .MPG file and a .PCM file. The MPG file is supposed to contain an encoded MPEG audio stream while the PCM file has the output after the corresponding MPG file has been run through the official reference decoder. The root mean square (RMS) of the difference between that reference PCM file and, say, the output of the FFmpeg decoder needs to be less than 1 / (32768 * sqrt(12)). So what’s the big deal?

First off, the reference output PCM is stored as a textual list of 24-bit hex numbers, e.g.:


How often do you have to sort out 24-bit signed integers? I thought of a generalized method to do this directly in Python:

So now it’s time to see if FFmpeg can decode data in a way that can effectively be compared to the reference 24-bit PCM samples. So I try to decode the first sample, fl1.mpg, with ‘ffmpeg -i fl1.mpg out.wav’. FFmpeg declines to recognize the file. Huh? It seems that these reference samples are also stored as text, but with 32-bit hex numbers in big endian format. See if you can identify the start code:


The MPEG standard is quite old and who knows how old this sample suite is. I suppose that this is just how they did things back in the day. Though things changed a bit with layer 3 — they didn’t package the PCM reference output, opting instead for Solaris/Sparc and Win32 binary reference decoder programs that can generate the reference output.

Anyway, this program was my solution for converting the text lines into a binary file that FFmpeg is able to handle:

Though I’m certain that there must be simpler, shell-based one-liners to accomplish the same (objcopy?).

So now I’m ready to decode the reference MPG file with FFmpeg. But into what format? FFmpeg supports output to a raw, signed, 24-bit format. But by the time I got to this phase, I was on a train working on my Eee PC without my original notes from above, couldn’t remember my method for processing the reference 24-bit textual data and didn’t feel like deriving and testing it anew. So instead, I decided to work directly in 32-bit by converting the 24-bit hex ASCII numbers to integers, multiplying by 256, and converting via 2’s complement method if the top bit was set. Then I decoded the MPG file as s32le.

Great, so I now have 2 32-bit waves and can proceed to compute RMS on the difference wave. Per my understanding, the RMS threshold assumes computation based on a floating point range of -1.0..1.0. Do I really need to do this entirely in the float space? I mean, I think I know how to normalize the integers down to that range — divide each one by 2n-1, where n = the number of bits comprising the integer. However, if multiplying both the numerator and denominator of the RMS constant by MAXINT for the given number of bits, it seems plausible to do the calculation in integer space:

(MAXINT / MAXINT) * (1 / (32768 * sqrt(12))

32768 = 215. If operating with 32-bit numbers, the RMS threshold constant becomes:

232-15 / sqrt(12)

So I set out to write a Python script that can load both 32-bit int waves and compute the RMS using 32-bit ints, 24-bit ints, and floating point numbers. This was my first batch of results using fl1.pcm/mpg:

RMS = 26549.415250, 32-bit threshold = 37837.227242
RMS = 103.708653, 24-bit threshold = 147.801669
RMS = 0.000012, float32 threshold = 0.000009

It took some massaging (debugging, actually) to get the floating point result that close. I was about to blame this all on floating point precision until I remembered an option in FFmpeg’s libavcodec/mpegaudio.h named CONFIG_AUDIO_NONSHORT. Without it, any integer audio higher than 16 bits is just being shifted. Sure enough, a quick test validated that none of FFmpeg’s 32-bit PCM had any of the lower 16 bits set in any individual sample.

After changing the option, recompiling, re-decoding, and validating that the output samples have a little more precision, this is what my RMS tool produces:

RMS = 1753.516427, 32-bit threshold = 37837.227242
RMS = 7.252516, 24-bit threshold = 147.801669
RMS = 0.000001, float32 threshold = 0.000009

So, congratulations to FFmpeg for meeting specification on at least one MPEG-1, layer I conformance sample. I will keep you posted on the others after I refine the testing process and hopefully make it part of FATE soon.

See also:

Posted in FATE Server, Python | 4 Comments »

4 Responses

  1. Robert Swain Says:

    Nice. :)

  2. Mans Says:

    Perl versions:
    32-bit: perl -ne ‘print pack “N”, hex’
    24-bit: perl -ne ‘print pack “N”, hex() << 8’

    The () are needed in the second case for reasons of semantic obscurity.

  3. Multimedia Mike Says:

    “reasons of semantic obscurity” … doesn’t that pretty much describe Perl? Thanks for the tip, though.

  4. StefanG Says:

    When I had to do some conformance test of a 3rd party decoder at work once, I also wanted to compare to a few other decoders, and found that some of them had an inexplicable offset of a few samples in the output. This made it necessary (or at least that was my solution) to first make a least-squares-fit to find that offset, then cut off the samples which did not exist in both output and reference, and do the conformance test with the rest. If that is not true for ffmpeg, then IIRC it was the lame decoder putting some silence in.