Tag Archives: fate

Investigating Alternate Continuous Integration Software

Ever since I figured out that FATE falls into the category of continuous integration (CI) software, I have been evaluating alternate, established, mature software packages that might be able to supplant FATE for managing FFmpeg automated building and testing. I don’t think it’s fair to say that I have been disappointed with the offerings I have found out there; it’s just that many other software packages have apparently approached this problem from a slightly different angle, with a slightly different set of assumptions, and slightly different resources on which to deploy a solution.

For starters, when I read the overviews for a number of these packages, I am never quite sure if they actually do what FATE already does. Many packages brag about their amazing abilities to perform continuous builds on many platforms. I might be missing something crucial here, but that fact alone doesn’t impress me. From my perspective, and in my experience, automating the build among multiple independent machines is the easy part. The difficult part is managing tests and aggregating the results back to a central location.

Aside: Sometimes I back up and wonder: Why aren’t our tests integrated into the build process? Oh yeah, they are: ‘make test’. But why can’t that general test be made to cover all the functionality? Because the ‘make test’ regression suite depends on the existence of both an encoder and a decoder for a particular format. FFmpeg covers a large amount of formats that it can only decode, not encode (and no one is going to step up to write even naive corresponding encoders for these formats in the near future, nor would I argue that they should). Therefore, it becomes necessary to test asymmetric decoder (and demuxer) functionality using individual tests as I designed FATE to do.

I designed FATE such that a client program would run on diverse platforms, constantly check for and rebuild new FFmpeg code, run a battery of tests, and log the results of the build and tests (including status, stdout, stderr, and performance stats) back to a central server. I wanted the client to be written in Python and use only the (expansive) standard Python library, if possible. More stringent were the server requirements: I needed something that operated via HTTP backed by PHP or Perl — because that’s what my web host provides, at least if I want to talk to MySQL (Ruby is also an option; so is Python as long as the CGI script doesn’t need to talk to MySQL).

“But you could host the central server at home on your own broadband and use whatever configuration you want.” Actually, administering things like web and email services is something I would rather pay a few dollars per month for someone else to do on my behalf. I am quite adamant about this. I feel the same way about laundry.

“If FATE ain’t broke, why fix it?” Well, it is sort of broken, or at least incomplete. Many of these packages already boast features like email notifications and even IRC notifications, in addition to better web interfaces– features that I would like to implement when time permits. And at the very least, surveying other CI software might give me ideas about how to improve FATE.

Can existing CI packages solve my problems? Continue reading

Second Class Citizens

Not all builds should be treated equally. Some are more important than others. I propose that the FATE should distinguish between important and less important configurations. My motivation for this is that I want to implement a meter that indicates the health of the overall code base. While it would be ideal for all FATE configurations to be 100% green at all times, I don’t think it’s fair to penalize the entire FFmpeg codebase just because some less prevalent platforms aren’t performing up to spec.

What platforms should be considered first class? I’m thinking latest gcc 4.3 and 4.2 series for Linux on x86_32, x86_64, and PowerPC, at a minimum.

In other FATE news, I have started computing percentages of test coverage. According to my numbers, FATE currently tests 58% of FFmpeg’s total mux/demux/encode/decode features. It’s a start, I suppose.

FFmpeg Perceptual Audio Test Plan

There have been some problems with FATE audio testing. First off, the qt-ima4-stereo test spec was testing against the wrong file for the past year. Stereo IMA ADPCM decoding could have broken in QuickTime and we might have never been alerted. Sloppy.

More seriously, I found out that many of my existing, bitexact audio tests have not been constructed properly. This is due to the fact that these 2 commands:

ffmpeg -i file.ext file.wav
ffmpeg -i file.ext -f wav - > file.wav

do not yield equivalent sets of bytes inside file.wav. Part of the reason is that, after writing out all the audio samples, the muxer needs to rewind to the header so that it can write the data payload length. When writing data to stdout, the program does not have the option to rewind the output stream. However, I don’t understand the entire discrepancy. Using the file qt-ima4-mono with the above command lines:

1156652 surge.wav
1146924 surge-stdout.wav

The file that is routed through stdout is notably smaller (9728 bytes smaller). I was going to write this off as the stdout file failing to be flushed. However, the behavior is consistent across all machines and platforms.

My proposed solution is to update all of the audio tests to use this raw format target:

ffmpeg -i file.ext -f s16le -

Since the output is equivalent to:

ffmpeg -i file.ext -f s16le file.s16le

1156608 surge.s16le
1156608 surge-stdout.s16le

Moving right along, there is the much bigger task of testing perceptual audio decoders. Working down the FATE Test Coverage list, these perceptual audio codecs will get the naive, one-off wave reference treatment in lieu of a proper conformance suite: ATRAC3, RealAudio Cooker, DCA (DTS), IMC, Nellymoser, Qcelp, QDesign, RealAudio 28.8, Truespeech, Vorbis, and WMA v1.

Then there is the matter of MPEG audio codecs for which we have access to extensive conformance suites. Thanks to Kostya and Benjamin for furnishing pointers to precise information discussing how to verify if your MP1/2/3 or AAC audio decoder is up to snuff. This page at Underbit describes exactly how the spec describes conformance for MPEG 1, layers 1, 2, and 3, and also evaluates the conformance of various implementations. The comparison ostensibly predates FFmpeg. This Mp4-tech mailing list post shows the way regarding AAC conformance.

So I need to automate the MP1/2/3 and AAC test entries. I estimate the automated process will work something like this:

  • Decode encoded file
  • Run comparison of decoded wave against original wave
    • For MP1/2/3, this seems to entail converting both the FFmpeg output and the original wave output floating point numbers to a normalized range of -1.0..1.0, computing the root mean square of the difference signal, and verifying that the RMS is less than 1 / (32768 * sqrt(12))
    • For AAC, well, I’m still researching the precise criteria
  • If the decoded wave is within tolerance, add a new test

The part where I get a bit fuzzy is: what should the test spec be? Should I generate a reference wave and test future decoded waves against it using my one-off wave reference method? Or, should I just go ahead and compute the RMS of the difference signal? I assume that if I use the nifty numpy library for the task, it couldn’t possibly make any measurable difference in the performance of FATE testing vs. using the one-off wave reference method (computing absolute value of the difference signal and checking that no discrete points exceed 1).

One trade-off is that I would need to store the full 24-bit reference waves in order to properly compute RMS, which is 50% more data than I would need with the one-off method. And I’m still not sure how to process the 24-bit data in any event.

Less Frequent Tasks

Michael suggested on the FFmpeg-devel list that Doxygen documentation ought to be continuously generated so that any errors and warnings during documentation generation can be caught, logged, analyzed, and minimized. However, the consensus was that it’s not especially useful to add this to the master FATE suite of test specs.

Another item that came up in the discussions of a possible release is that one of our tests should be the processing of an entire DVD-length movie to catch any problems (like memory leaks) that only manifest over a long runtime. Obviously, that’s not especially appropriate for a normal FATE test spec.

And another type of test that I envisioned when I was originally brainstorming the system (for a year and a half) is a way to continuously fuzz-test FFmpeg. But, like the previous 2 items, it does not need to be performed on every code commit.

I realized that all of these tasks (and probably more– be creative) can be run on a less frequent basis — say, once per day — and on one machine (like the fastest machine on my farm). It can be set up as an adjunct project to FATE.

Now I need a good FFmpeg command line for converting a ripped DVD image to another format that will maximally stress the program, in a multithreaded manner, no less.