FATE needs to have more tests. A lot more tests. It has a little over 200 test specs right now and that only covers a fraction of FFmpeg‘s total functionality, not nearly enough to establish confidence for an FFmpeg release.
Here’s the big problem: It’s a really tedious process to initiate a new test into the suite. Sure, I sometimes write special scripts that do the busywork for me for a large set of known conformance samples. But my biggest record for entering tests manually seems to be a whopping 11 test specs in one evening.
The manual process works something like this: Given a sample that I think is suitable to test a certain code path in FFmpeg, place the sample in a shared location where my various FATE installations can reach it. Then, get the recent FFmpeg source from SVN (in repositories separate from where FATE keeps its code). Compile the source on each platform, using whichever compiler I feel like for each. On a platform that has SDL installed, run the sample through ffplay to verify that the data at least sort of looks and sounds correct (e.g., nothing obviously wrong like swapped color planes or static for audio). Then, run a command which will output CRC data per the ‘-f framecrc’ output target. Visually compare the CRC data (at least the first and last lines) to verify that the output is consistent across a few platforms (say, PPC, x86_32, and x86_64). Then go through the process of writing up the test in my FATE administration panel.
I’m constantly thinking about ways to improve processes, particularly processes as tortuously tedious as this. The process has already seen a good deal of improvement (before making a basic web admin form, I had to add and edit the test specs from a MySQL console). I intend to address the inadequacy of the basic web form at a later date when I hopefully revise the entire web presentation. What I want to do in the shorter term is address the pain of verifying consistent output across platforms.
I got the idea that it would be nice to be able to ask a FATE installation — remotely — to run a test and pass back the framecrc output. This way, I could have one computer ask several others to run a test and quickly determine if all the machines agree on the output. But would I have to write a special server to make this possible? Sounds like a moderate amount of work. Wait, what about just SSH’ing into a remote machine and running the test? Okay, but would I still have to recompile the source code to make sure the FFmpeg binary exists? No, if these are FATE installations, they are constantly building FFmpeg day and night. Just be sure to save off a copy of the ‘ffmpeg’ binary and its shared libraries in a safe place. But where would such saving take place? Should I implement a post processing facility in fate-script.py to be executed after a build/test cycle? That shouldn’t be necessary– just copy off the relevant binaries at the end of a successful build mega-command.
So the pitch is to modify all of my FATE configurations to copy ‘ffmpeg’ and 4 .so files to a safe place. As a bonus, I can store the latest builds for all configurations; e.g., my x86_32 installation will have 8 different copies, one for each of the supported compilers. The next piece of the plan is Python script! Create a configuration file that is itself a Python file that has a data structure which maps out all the configurations, the machines they live on, the directory where their latest binaries live, and where they can find the shared samples. The main Python script takes an argument in the form of (with quotes) “FFMPEG_BIN -i SAMPLES_PATH/sample.avi -an -t 3 -f framecrc -“, iterates through the configurations, builds SSH remote calls by substituting the right paths into the command line, and processes the returned output.
Simple! Well, one part that I’m not sure about is exactly how to parse the output. I think I might use the whole of the returned stdout string as a dictionary key that maps to an array of configurations. If the dictionary winds up with only one key in the end, that means that all the configurations agreed on the output; add a new test spec!
Thanks for sitting through another of my brainstorming sessions.
See also: