Numerical Gymnastics Redux

Remember in my last post when I described that the reference encodings in the MPEG-1 audio conformance suite were stored as a list of 32-bit hex numbers in ASCII format? I just thought I would mention that that was only for the layer I encodings. The layer II encodings, for whatever reason, only have 1 byte per line in ASCII format. The layer III encodings are in a proper binary form, however.

Anyway…

Now that I am confident that the root mean square (RMS) tests pass, I need to decide how to store the samples and in which numerical precision and format the RMS will be computed. At first I reasoned that, since the 24-bit integer, 32-bit integer, and 32-bit float precisions all yielded passing results, any should work. However, before I got enough precision in the FFmpeg output, the 32-bit float precision failed where the other 2 still succeeded. This leads me to believe that the 32-bit float space would be the best precision to work with.

However, some tests reveal that either I’m doing something wrong, or FFmpeg has a bug in which it flips sign on individual samples when converting to a floating point format. My money is on the former (i.e., my mistake). However, I then realized that there is really no reason to ask FFmpeg to output floating point data from its various MPEG-1 audio decoders since they are all decoding to integers anyway. However, I do need to perform some configuration rework in order to compile FFmpeg in such a way that it will output 32-bit precision integers via configuration option vs. manual hacking.

So my proposed testing process for the MPEG-1 audio conformance vectors is the following:

  • patch FFmpeg to allow for a –enable-audio-long configure option that will allow audio decoders to output higher precision audio (only applies to MPEG-1 decoders right now)
  • convert all of the encoded samples to proper binary files
  • convert all of the conformance vectors to s32le raw format; while this is 33% more data than is strictly necessary, I think it will be easier to process chunks of 32-bit data vs. 24-bit
  • stage the encoded samples and reference waves in the formal FATE suite
  • modify fate-script.py to honor a new command in the form of {RMS,$SAMPLES_PATH/wave-n-ref.s32le,37837.0} $BUILD_PATH/ffmpeg -i $SAMPLES_PATH/wave-n.mpg -f s32le -; the first parameter of the RMS special directive is the file of raw, 32-bit, signed, little endian data against which the command output must be compared, while the second parameter is the RMS threshold not to be exceeded (in this case, 232-15 / sqrt(12) = 37837, see last entry for explanation)
  • enter new FATE test specs

Now that I write it all out, however, I realize that it is not strictly necessary to get FFmpeg modified to output higher precision numbers since the 16-bit numbers, scaled up, will pass the 24- or 32-bit thresholds, per my empirical findings. This makes me wonder if I should store and read the data as 32-bit integers (and enable high precision from FFmpeg), but then convert the numbers to floating point for the RMS calculation. The performance impact would be negligible (getting all the numbers lined up in arrays still takes longer than doing floating point ops on them), and the test would be stricter and conceivably catch more problems. Then again, it may have been a math error on my part that caused the floating point test to fail while the 24- and 32-bit tests worked.

One more stipulation I (may) need to make in the final test: The reference wave always has considerably more samples (e.g., 65536) than FFmpeg decodes (e.g., 37632). I have been performing RMS along the length of the shorter wave and the test has been meeting threshold. I still don’t know if this is a discrepancy to worry about, but at the very least, I think I should add a provision in the ad-hoc {RMS} method that the decoded wave has to be at least half as long as the reference wave.