A big shortcoming of FATE so far has been its inability to test perceptual audio and video codecs. This is because when FATE runs a test, it compares the output to a known value in its own database, and the output needs to match the known value precisely, i.e., “bit-exact”. The problem with codecs classed as perceptual is that they are not specified to decode in a bit-exact manner. So, for example, decoding the Ogg Vorbis audio file abc.ogg on x86_64 and on PowerPC will produce 2 waves that, though they may sound identical to most listeners, are not precisely the same down to the PCM sample level; minor variations exist (generally, +/- 1).
I have a plan for adapting FATE to handle this. It may seem a little (or a lot) crazy, but hear me out.
At first, I am only thinking about perceptual audio codecs. This will include Vorbis, MP3, AAC, WMA, QDesign, Real-cook, Real-28_8, and a bunch of others I am forgetting at the moment.
The big idea is to store reference decoded waves and then, for each perceptual audio decoding test, decode the file and compare the wave to its reference wave; fail the test if the difference of any of the PCM points is greater than 1.
How to perform the comparison? I have a few ideas:
- Craft a default Python algorithm that painstakingly unpacks each byte from both waves, iterates along each, and calculates the absolute value at each sample.
- Allow for a FATE installation to call out to a more efficient helper program, one preferably written using SIMD instructions that could read 16 bytes at a time from each wave, and perform absolute value calculations in parallel. I’m thinking a parallel subtract, followed by a parallel absolute value, followed by a bitwise AND should reveal if any of the 16 bytes is outside of tolerance.
- Any other tricks would be appreciated, especially regarding the default algorithm. Are there any special numerical tricks for determining the information I need from 4 bytes in parallel, packed in a 32-bit integer, without SIMD?
This has the potential to be big, sample-wise. It occurred to me to use FLAC to mitigate storage problems. My first impulse was to store the reference waves as FLAC files in a FATE installation’s sample suite. They would be decoded as needed during a build/test cycle. Decoding FLAC is reasonably fast, after all. However, the more I think about it, I think that part is a silly solution. As a compromise, I may store the reference waves as FLAC in the central MPlayerhq.hu FATE suite archive in order to mitigate storage and transfer requirements. It will also be time to create a small, standard syncing script that performs both the samples rsync and decompresses any new FLAC wave references in the archive.
All of this is highly speculative at this point. I don’t know how much storage these hypothetical reference waves are going to require. And I don’t know how long it’s going to take in practice to perform all the comparisons. And of course, I don’t know if the +/- 1 tolerance idea will hold up. Although cursory tests have been positive.
I know it’s a mathematically “impure” solution. But we need something and I wanted to get this possibly workable idea out.