Testing MPEG Audio Conformance Vectors | Breaking Eggs And Making Omelettes

You would not believe how many numerical gymnastics I have to perform in order to test these MPEG-1 audio conformance vectors. It seems straightforward enough– a conformance vector, at least for layers 1 and 2, consists of a .MPG file and a .PCM file. The MPG file is supposed to contain an encoded MPEG audio stream while the PCM file has the output after the corresponding MPG file has been run through the official reference decoder. The root mean square (RMS) of the difference between that reference PCM file and, say, the output of the FFmpeg decoder needs to be less than 1 / (32768 * sqrt(12)). So what’s the big deal?

First off, the reference output PCM is stored as a textual list of 24-bit hex numbers, e.g.:

006219 
006219 
FF0D63 
FF0D63
...

How often do you have to sort out 24-bit signed integers? I thought of a generalized method to do this directly in Python:

So now it’s time to see if FFmpeg can decode data in a way that can effectively be compared to the reference 24-bit PCM samples. So I try to decode the first sample, fl1.mpg, with ‘ffmpeg -i fl1.mpg out.wav’. FFmpeg declines to recognize the file. Huh? It seems that these reference samples are also stored as text, but with 32-bit hex numbers in big endian format. See if you can identify the start code:

FFFEC804
93ECED99
88888888
...

The MPEG standard is quite old and who knows how old this sample suite is. I suppose that this is just how they did things back in the day. Though things changed a bit with layer 3 — they didn’t package the PCM reference output, opting instead for Solaris/Sparc and Win32 binary reference decoder programs that can generate the reference output.

Anyway, this hex2bin.py program was my solution for converting the text lines into a binary file that FFmpeg is able to handle:

Though I’m certain that there must be simpler, shell-based one-liners to accomplish the same (objcopy?).

So now I’m ready to decode the reference MPG file with FFmpeg. But into what format? FFmpeg supports output to a raw, signed, 24-bit format. But by the time I got to this phase, I was on a train working on my Eee PC without my original notes from above, couldn’t remember my method for processing the reference 24-bit textual data and didn’t feel like deriving and testing it anew. So instead, I decided to work directly in 32-bit by converting the 24-bit hex ASCII numbers to integers, multiplying by 256, and converting via 2’s complement method if the top bit was set. Then I decoded the MPG file as s32le.

Great, so I now have 2 32-bit waves and can proceed to compute RMS on the difference wave. Per my understanding, the RMS threshold assumes computation based on a floating point range of -1.0..1.0. Do I really need to do this entirely in the float space? I mean, I think I know how to normalize the integers down to that range — divide each one by 2^n-1, where n = the number of bits comprising the integer. However, if multiplying both the numerator and denominator of the RMS constant by MAXINT for the given number of bits, it seems plausible to do the calculation in integer space:

(MAXINT / MAXINT) * (1 / (32768 * sqrt(12))

32768 = 2¹⁵. If operating with 32-bit numbers, the RMS threshold constant becomes:

2^32-15 / sqrt(12)

So I set out to write a Python script that can load both 32-bit int waves and compute the RMS using 32-bit ints, 24-bit ints, and floating point numbers. This was my first batch of results using fl1.pcm/mpg:

RMS = 26549.415250, 32-bit threshold = 37837.227242
pass
RMS = 103.708653, 24-bit threshold = 147.801669
pass
RMS = 0.000012, float32 threshold = 0.000009
fail

It took some massaging (debugging, actually) to get the floating point result that close. I was about to blame this all on floating point precision until I remembered an option in FFmpeg’s libavcodec/mpegaudio.h named CONFIG_AUDIO_NONSHORT. Without it, any integer audio higher than 16 bits is just being shifted. Sure enough, a quick test validated that none of FFmpeg’s 32-bit PCM had any of the lower 16 bits set in any individual sample.

After changing the option, recompiling, re-decoding, and validating that the output samples have a little more precision, this is what my RMS tool produces:

RMS = 1753.516427, 32-bit threshold = 37837.227242
pass
RMS = 7.252516, 24-bit threshold = 147.801669
pass
RMS = 0.000001, float32 threshold = 0.000009
pass

So, congratulations to FFmpeg for meeting specification on at least one MPEG-1, layer I conformance sample. I will keep you posted on the others after I refine the testing process and hopefully make it part of FATE soon.

See also:

FFmpeg Perceptual Audio Test Plan

4 thoughts on “Gymnastics Routine”

Robert Swain February 10, 2009 at 12:56 am

Nice. :)
Mans February 10, 2009 at 5:37 am

Perl versions:
32-bit: perl -ne ‘print pack “N”, hex’
24-bit: perl -ne ‘print pack “N”, hex() << 8’

The () are needed in the second case for reasons of semantic obscurity.
Multimedia Mike Post authorFebruary 10, 2009 at 7:06 am

“reasons of semantic obscurity” … doesn’t that pretty much describe Perl? Thanks for the tip, though.
StefanG February 10, 2009 at 7:21 am

When I had to do some conformance test of a 3rd party decoder at work once, I also wanted to compare to a few other decoders, and found that some of them had an inexplicable offset of a few samples in the output. This made it necessary (or at least that was my solution) to first make a least-squares-fit to find that offset, then cut off the samples which did not exist in both output and reference, and do the conformance test with the rest. If that is not true for ffmpeg, then IIRC it was the lame decoder putting some silence in.

Comments are closed.