Monthly Archives: January 2008

Zombie Artifacts

I was monitoring the processes on a build machine via ‘top’ during the testing phase of a FATE build/test cycle. At the top of the list was ‘ffmpeg <defunct>’. I was a bit concerned about FFmpeg zombie processes until I noticed that the PID attached to the zombie was steadily increasing at each refresh.

Zombies from Capcom's Ghosts N Goblins game

It turns out that these zombies are merely an artifact of the current infrastructure. According to the profiling information from my build/test script, the ‘test’ phase always seems to take 71 seconds to execute, give or take a second, regardless of platform. Incidentally, there are presently 71 active tests in the FATE suite. This led me to recognize that the build/test script is comically inefficient in this respect and that it should be possible to blaze through the tests much more quickly, and perhaps during the build phase as well, provided that not much has changed in the source (the build machines leverage ccache).

At issue is the way in which the script runs commands. It uses the Python subprocess module to spin off a process, monitor the stdout and stderr on separate pipes, and also kill the process if it runs too long. The upshot of the current method is that the script always waits at least 1 second before first checking if the child has finished. This leads to the zombies since the child FFmpeg process has finished but is waiting for its parent to wait for its final status code. I am working on revising this algorithm to be considerably more efficient, particularly since I anticipate eventually having many hundreds of individual tests in the suite.

Here’s another curious artifact I have observed regarding profiling. Python’s os module provides a nifty times() function that returns a bunch of useful timing data. Among the 5 values returned is the cumulative time that child processes of the main process have spent running on the CPU. I thought this would be perfect for profiling since it only accounts for CPU runtime, no I/O time. In reality, I am thinking that the OS simply counts the number of times that a process gets to run on the CPU and multiplies that by 10 milliseconds. At least, empiricial evidence suggests that to be the case since every test seems to complete in a time evenly divisible by 10 msec. I suppose this is good enough for the time being. Fortunately, there are some tests that run long enough for substantial differences to be observed between platforms. For example, the test designated h264-conformance-ba1_ft_c takes on the order of 1280 ms on PPC, 160 ms on x86_32, and 400 ms on x86_64 (all with gcc 4.2.2 compilation on Linux). Of course, those numbers should not be compared with each other, but with the same test run over time on the same CPU.

I’m open to more profiling ideas. Perhaps FFmpeg could include new command line options for fine grain testing of certain modules, or come with separate test programs to achieve the same. E.g., push a few hundred test vectors through DCT/IDCT and log the nominal timing from the timestamp counter for later graphing. For all I know, FFmpeg already has some options to achieve this (usually when I propose a new FFmpeg testing feature to Michael, he helpfully advises that said feature has been in the codebase for years).

64-bit Builds Are A Go

Please join me in welcoming the newest member of the FATE build farm: a 64-bit Ubuntu Linux session for pure 64-bit builds. The machine is actually a Mac Mini Core 2 Duo 2.0 GHz running VMware Fusion. Ideally, it would be nice to use the same machine for Mac OS X autobuilding and testing. Per my understanding, however, the base FFmpeg tree is not immediately build ready due to a conflict with the gcc version shipped with Apple’s default Xcode environment.

Apple Mac Mini

A pristine, dignified, stylized piece of Mac hardware, and what do I have it doing? Farm work. I also got the Mac Mini to try to delve into this Mac OS environment and see if it could possibly win me over as a full time user. That part isn’t looking too hopeful at this point, but I’m sort of committed to the FATE Server aspect now.

Rejected Ideas

People have provided a lot of useful and constructive feedback, both publicly and privately, regarding the developing FATE Server. I’m not taking every idea seriously, however. For example, several people have suggested that the build machines should trigger off of emails to the mailing list that monitors FFmpeg SVN commit messages and only check out new code and kick off new build/test cycles in response to such mails. Since the SVN server is responsible for sending the emails, the flow looks something like this:

A complicated protocol loop

Really, this post is merely an excuse to make more illustrations using OpenOffice’s Draw component. Anyway, the current FATE system operates by polling the SVN server periodically for updated revisions (where period=15min); it also checks again immediately after completing a full build/test cycle. In this case, I can’t justify adding the coding, debugging, and maintenance complexities of having the script poll the server somehow (or even have emails pushed via IMAP), parse the emails, and determine when to ask SVN for new source, when the periodic poll process performs peachy. Thus, one box and two arrows are eliminated from the drawing entirely.

One person who has a lot more experience in web database apps than I do was appalled that I was going the simple route by using the Python MySQLdb module to access the FATE Server’s MySQL database directly and insert new build records and test results. Hey, it’s the simplest solution, and my web provider allows me to do it.

Straight MySQL Protocol

Apparently, it’s trendier for modern apps to travel along a more circuitous route. This means, rather than use the most direct protocol, package the data in some intermediate format — often an XML-based format — and send it along an HTTP transport. Common candidates for the task include XML-RPC, REST and SOAP. Ad-hoc protocols are also possible.

A more roundabout protocol

At first, this went right into the ‘reject’ bin as well. I may need to rethink that, though. I am lining up helpful souls who wish to lend their own custom hardware resources to this effort so that FATE (and FFmpeg, in turn) can enjoy broader platform support. It turns out that Python-MySQLdb does not work equally well on all operating systems. Hopefully, Python’s built-in HTTP libraries will work well enough that I could build my own ad-hoc protocol (no XML, thanks) if need be.

Working With Git

I want to be responsible and organized as I develop the FATE Server. To that end, I thought it would be good to get all of the project source code into a source control system. Since Git is building momentum, I thought this would be a great opportunity to get my feet wet (similar to how this exercise has been a good reason to learn Python).

Git logo

I’m pleased to report that Git is performing admirably. It’s important to remember, however, that I have low standards when it comes to source control. Indeed, any SCM is equally adequate when you’re working by yourself on one machine. Git still keeps the easy things easy: git init, git add, git commit, git diff, git log; that’s as deep as I have delved thus far. At least I will have a baseline of experience for when I get actively involved with a project that uses Git, which is where many would like FFmpeg to go one day.