Apparently, Aurel thought my proposed imprecise audio testing method for FATE had some merit, even if everyone else thinks I’m crazy. Aurel proposed and prototyped a new PCM encoder called pcm_s16le_trunc. This normalizes a sequence of signed, 16-bit PCM samples according to the formula
sample[n] = (sample[n] + 1) & ~0x03
So it sort of smooths out the rough edges of a PCM wave in the hopes of making it possible to compare to reference waves in a bit-exact manner. Unfortunately, deploying the method on my distributed RPC testing system yields no success. For example, testing a particular AAC file across 18 different FFmpeg configurations results in 7 unique sets of output, which group the same way as without ‘-f pcm_s16le_trunc’, though with differing output.
I still think it’s an interesting idea. Any other refinements to throw out there?
To reiterate on a comment I left on the last post, I do not wish to completely supplant the official validation methods that are already in place for various audio coding standards, most notably the MPEG audio standards. I envision 2 stages for these audio tests. Really, this parallels what I described when I discussed how I enter tests into FATE in the first place– the first stage is to validate that a given sample produces the correct results. For many test specs in FATE, this involves manually consuming the contents of a media sample with my own eyes and ears to judge if it looks and sounds minimally sane. The first stage for these audio tests will be in the same spirit, only that there will be more mathematically rigorous qualifications for getting into FATE. Beyond just “sounding okay”, files will have to meet certain PSNR thresholds. The second stage is to create reference waves that the audio decoders can be continuously tested against.
What thresholds need to be met, and what files should be used? Well, that varies depending on codec:
- MP2/MP3/AAC: Plenty of official conformance samples available, and I’m pretty sure I have been sitting idly on most of them for many years. I need to do a little research to determine how close the decoder needs to get the final output.
- Vorbis: Savior of open source multimedia (at least in the audio domain), while the format is exhaustively documented, I don’t see any specifications regarding quality thresholds that encoders and decoders need to meet, nor is there any conformance suite that I know of. This is particularly troubling since there have been numerous revisions to the format over the years and older variations are undoubtedly still “in the wild”. It would be nice to assemble a sample collection that represents various generations of the format.
In any case, it will be reasonable to generate our own samples for testing this format.
- RealAudio codecs such as Cooker and 28_8; also AC3, ATRAC3, QDesign, QCELP, assorted others: I’m not especially motivated to find the software that creates these formats for contriving my own encodings so that I know the source material. There are no known conformance suites, and there is no way that we can know intended, reasonable thresholds. I think we’ll just have to take the “sounds good enough; make new reference wave” approach for these.
One last thought: one prospect I appreciate about this reference wave testing idea is that I wouldn’t have to update the test specs in the database if the output has changed significantly– just update the reference waves (as long as they pass whatever thresholds we put in place).
See Also:
- Not An Exact Science, my original proposal