Monthly Archives: December 2008

FATE Comes Through

I was away on a brief Christmas vacation for the last week or so. I was completely absent from ffmpeg-devel list traffic during that time, as well as all email. Occasionally, I would sneak a look at FATE’s main page when I was in the vicinity of an unguarded web browser and clench my fists when I saw that various FFmpeg items were broken, powerless to alert anyone to these facts via email. My tensions eased, however, and my spirits lifted when I would check back a little while later and see that the problems were gone and that the SVN log messages indicated that code was being submitted specifically to address the breakages.

Go team.

However, being away from the action and having FATE as my only window into FFmpeg development reminded me of the various ways the web interface desperately needs to improve. The main fate.multimedia.cx page needs to have a brutally concise summary of the health of the FFmpeg codebase as tested by FATE. I envision that the spirit would be, “Everything is okay… except for the following problems: …”, followed by a list of problems, sorted by newest to oldest.

How to do this? For starters, I have had it on my TODO list for several months now to experiment with some popular PHP web frameworks. I have done enough homework to get over my distaste for the notion of such frameworks and I’m thinking (hoping, really) that one can be of substantial benefit.

Of course, another oft-requested feature is for email notifications when something breaks. Believe me, I really want this too. As you can imagine, this is something I really want to get correct from the start and verify that all the bugs are stamped out. Specifically, I’m paranoid about accidentally spamming a mailing list due to a stupid bug. Also, I want to make sure that such a notification service reports the useful information, where I’m currently defining “useful” as state transitions: report when a build or a test goes from working to broken or vice versa. But I also want to be able to aggregate information about breakages. Example: There are presently 20 FATE configurations. If an SVN commit breaks a test for all the configurations, it would be better for an alert email to report concisely that a recent commit broke a particular test for ALL supported FATE configurations, rather than 20 individual emails representing each configuration, or even one email listing each individual configuration.

It’s a real concern. If I don’t get this right, it’s only going to irritate people right out of the gate and defeat the whole purpose when it goes largely filtered and ignored. I know from whence I speak due to working with similar notification systems on other large codebases.

New and Reinstated Tests

So I set up the RPC testing tool in such a way that it includes not only all 8 Linux/x86_32 configurations, but also the Mac/PPC configuration. So when I launch the RPC tool, I am testing a command line across 21 different combinations of platform/compiler configurations for FFmpeg.

That, my friends, is power.

Thanks to this new tool — nay, superpower — I have efficiently updated and reinstated the following test specs:

Even though I am asking for input from the Mac/PPC build, it’s probably a good thing that it is not part of the overall test environment yet. There are more than a few tests (most RGB colorspace video decoders) for which the Mac/PPC build’s output varies from all the other platforms, even the Linux/PPC configurations. I don’t know where the discrepancy lies.

Further, I have finally gotten back to adding new test specs. Predictably, the major bottleneck now is the web administration interface. Working from my notes in the FATE Test Coverage MultimediaWiki page, I added these tests:

In case I haven’t adequately articulated my case, let me reiterate that this RPC test staging tool is really neat. When testing a spec, I craft the most unremarkable command line (ffmpeg -i file -f framecrc -) and see the results. If there is an endian clash — i.e., all the big endian configurations hold one opinion about the stdout vs. the little endian configurations — I check the native colorspace of the video decoder. If it’s an RGB-based video codec, I refine the command line with a “-pix_fmt rgb24” to normalize the colorspace and dispatch the command again. If the video codec is YUV-based and I know or suspect it involves a DCT, I refine the command with “-idct simple” and send it out again.

Implementing The RPC Idea

About that RPC-based distributed test staging idea I brainstormed yesterday, I’ll have you know that I successfully implemented the thing today. I used the fraps-v4 test spec for verification because it is known to work correctly right now, and because it only has 7 lines of stdout text. This is what the script looks like in action:

$ ./rpc-dist-test.py "FFMPEG -i 
  SAMPLES_PATH/fraps/WoW_2006-11-03_14-58-17-19-nosound-partial.avi 
  -f framecrc -" 
asking for results from 12 configurations...
testing config 0: Linux / x86_64 / gcc 4.0.4
testing config 1: Linux / x86_64 / gcc 4.1.2
testing config 2: Linux / x86_64 / gcc 4.2.4
testing config 3: Linux / x86_64 / gcc 4.3.2
testing config 4: Linux / x86_64 / gcc svn
testing config 5: Linux / PPC / gcc 4.0.4
testing config 6: Linux / PPC / gcc 4.1.2
testing config 7: Linux / PPC / gcc 4.2.4
testing config 8: Linux / PPC / gcc 4.3.2
testing config 9: Linux / PPC / gcc svn
testing config 10: Mac OS X / x86_32 / gcc 4.0.1
testing config 11: Mac OS X / x86_64 / gcc 4.0.1

1 configuration(s) failed
  configuration Mac OS X / x86_32 / gcc 4.0.1 returned status 133

There was 1 unique stdout blob collected
all successful configurations agreed on this stdout blob:
0, 0, 491520, 0x68ff12c0
0, 3000, 491520, 0x22d36f0d
0, 6000, 491520, 0xce6f877d
0, 9000, 491520, 0x85d6744c
0, 12000, 491520, 0x1aa85794
0, 15000, 491520, 0x528d1274
0, 18000, 491520, 0x357ec61c

A few notes about the foregoing: Continue reading

RPC-Based Distributed Test Staging

FATE needs to have more tests. A lot more tests. It has a little over 200 test specs right now and that only covers a fraction of FFmpeg‘s total functionality, not nearly enough to establish confidence for an FFmpeg release.

Here’s the big problem: It’s a really tedious process to initiate a new test into the suite. Sure, I sometimes write special scripts that do the busywork for me for a large set of known conformance samples. But my biggest record for entering tests manually seems to be a whopping 11 test specs in one evening.

The manual process works something like this: Given a sample that I think is suitable to test a certain code path in FFmpeg, place the sample in a shared location where my various FATE installations can reach it. Then, get the recent FFmpeg source from SVN (in repositories separate from where FATE keeps its code). Compile the source on each platform, using whichever compiler I feel like for each. On a platform that has SDL installed, run the sample through ffplay to verify that the data at least sort of looks and sounds correct (e.g., nothing obviously wrong like swapped color planes or static for audio). Then, run a command which will output CRC data per the ‘-f framecrc’ output target. Visually compare the CRC data (at least the first and last lines) to verify that the output is consistent across a few platforms (say, PPC, x86_32, and x86_64). Then go through the process of writing up the test in my FATE administration panel.

I’m constantly thinking about ways to improve processes, particularly processes as tortuously tedious as this. The process has already seen a good deal of improvement (before making a basic web admin form, I had to add and edit the test specs from a MySQL console). I intend to address the inadequacy of the basic web form at a later date when I hopefully revise the entire web presentation. What I want to do in the shorter term is address the pain of verifying consistent output across platforms.

I got the idea that it would be nice to be able to ask a FATE installation — remotely — to run a test and pass back the framecrc output. This way, I could have one computer ask several others to run a test and quickly determine if all the machines agree on the output. But would I have to write a special server to make this possible? Sounds like a moderate amount of work. Wait, what about just SSH’ing into a remote machine and running the test? Okay, but would I still have to recompile the source code to make sure the FFmpeg binary exists? No, if these are FATE installations, they are constantly building FFmpeg day and night. Just be sure to save off a copy of the ‘ffmpeg’ binary and its shared libraries in a safe place. But where would such saving take place? Should I implement a post processing facility in fate-script.py to be executed after a build/test cycle? That shouldn’t be necessary– just copy off the relevant binaries at the end of a successful build mega-command.

So the pitch is to modify all of my FATE configurations to copy ‘ffmpeg’ and 4 .so files to a safe place. As a bonus, I can store the latest builds for all configurations; e.g., my x86_32 installation will have 8 different copies, one for each of the supported compilers. The next piece of the plan is Python script! Create a configuration file that is itself a Python file that has a data structure which maps out all the configurations, the machines they live on, the directory where their latest binaries live, and where they can find the shared samples. The main Python script takes an argument in the form of (with quotes) “FFMPEG_BIN -i SAMPLES_PATH/sample.avi -an -t 3 -f framecrc -“, iterates through the configurations, builds SSH remote calls by substituting the right paths into the command line, and processes the returned output.

Simple! Well, one part that I’m not sure about is exactly how to parse the output. I think I might use the whole of the returned stdout string as a dictionary key that maps to an array of configurations. If the dictionary winds up with only one key in the end, that means that all the configurations agreed on the output; add a new test spec!

Thanks for sitting through another of my brainstorming sessions.

See also: