Author Archives: Multimedia Mike

Ambitious Testing Effort

When the FATE initiative went live, I asked the guru how to handle the H.264 conformance suite samples– should I just dump all of them into the database whether they were working or not, or should I only enter the samples that worked with the current version of FFmpeg? His answer was far more complicated than I could have anticipated:

  1. Enter all currently working samples
  2. If a particular H.264 conformance vector used to work with FFmpeg, add the sample and enter a new issue in the tracker
  3. Otherwise, don’t add the test yet

Whoa. As you know, I got task #1 accomplished relatively easily. Now I’m back to take on task #2.

Hypothesis: most of the code that can make or break the H.264 decoding process lives in files named libavcodec/h264*. Thus, test the sample suite against every single FFmpeg revision in which one of those files was touched.

  for file in libavcodec/h264*; do svn log $file; done | 
  grep "^r.*lines$" | 
  sed -e 's/^r\([0-9]*\).*$/\1/' | 
  sort -n | 
  uniq

That produces just over 400 different FFmpeg revisions that need testing. I had better get started early.

Algorithm outline:

  • create a script that takes the above revision list and the directory full of H.264 conformance vectors
  • create a list of standard test names based on the convention already in the database
  • query the database to obtain a complete list of all tests known to work currently
  • remove the working tests from the list of all tests
  • for each revision:
    • get the FFmpeg code corresponding to that revision
    • build FFmpeg, and use ccache to hopefully gain a little speedup in the process
    • test FFmpeg against all of the non-working samples, output results in a CSV format: “revision, 0, 0, 0, 1, 0, 0, 0, 0, 0,…”; this should facilitate analysis and serve to illustrate that the non-working samples have been broken from the get-go

Hey, computing cycles are cheap, right? Perhaps the same ambitious strategy can be employed as a one-time brute force method to learn when other FFmpeg components broke so that they can be fixed and subsequently tested via FATE. And there’s no reason I have to do this on my own; I know certain FFmpeg developers who like to brag about their cumulative 27 or so underworked CPU cores laying around their flats (you devs know who you are).

Growing Pains Of FATE

When I upgraded my web hosting plan last summer from 800 MB of online storage to 1/2 TB, I wondered what I could possibly use all that extra space for. The FATE Server is stepping up to the task and presently — somehow — occupies 1/2 GB of space. This is not a problem in and of itself since it’s only 1/1000 of my total allotment. However, it makes a regular, responsible backup schedule difficult to keep. I have toyed with the idea of hosting the database operation on my own hardware and bandwidth. I’m pretty sure I’m the primary user of this database anyway. Having the database under local authority would also likely allow for greater flexibility and configurability for the underlying engine.

As always, I have plans to add many, many, many more tests. There are various public MPEG conformance suites for different codecs, each consisting of tens or hundreds of samples. There is FFmpeg’s internal regression suite which ought to be run and verified for each build. By my accounting, ‘./ffmpeg_g’ is invoked over 300 times when running ‘make test’; I suspect each of those invocations would be a separate test in the database. Whenever I think of getting down to it and starting to enter individual test specs into the database using my custom PHP tool, I always step back, glance at the magnitude of the task, and instead start outlining a script that will automatically process the test series for me, and with fewer mistakes.

However, there are yet more problems. There are only 76 tests currently. Logging the 76 individual test result records nominally takes 10-15 seconds. Yes, I use one MySQL connection, but with 76 separate INSERT queries. It would probably be more efficient to concatenate them into one INSERT query with 76 records. However, it would probably be even better to parameterize the data, compress it, and POST it via HTTP to a custom CGI script on the server that could uncompress it and perform the INSERT locally and more efficiently, ideally. This would solve firewall problems and library problems as outlined in a previous post and allow for more diverse platform expansion in the future.

Finally, I also need to be able to test tests before deploying them. That’s right — test tests. I.e., enter a new test, or a series of new tests, into a staging area and be able to run a special script to verify that I got all the basic details right such as sample paths and FFmpeg command line parameters. None of this nonsense about dumping in a new test spec and waiting until the next SVN commit to see if I got it all correct. Or worse yet, artifically starting a new build/test cycle with a document update SVN commit. Out of all the problems examined in this post, this should be the easiest to take care of.

Thanks for putting up with yet another edition of Multimedia Mike’s research notepad.

Per-Frame Metadata

Someone asked me a question in email today that I thought I would pass on to the broader group. Are there any general methods for attaching general metadata — e.g., EXIF data — to individual video frames?


video frame metadata

At first thought, this strikes me as a container-level matter. However, there could be a video codec that embedded metadata in each frame and could be stored in any generalized container format.

Personally, I have always been more aware of issues surrounding content playback vs. content creation which I admit is somewhat of a weakness in my overall multimedia knowledge. The submitter had in mind data along the lines of absolute timestamps and GPS coordinate information to be included with every frame. This may seem a tad excessive but you can never underestimate other peoples’ requirements.

Zombie Artifacts

I was monitoring the processes on a build machine via ‘top’ during the testing phase of a FATE build/test cycle. At the top of the list was ‘ffmpeg <defunct>’. I was a bit concerned about FFmpeg zombie processes until I noticed that the PID attached to the zombie was steadily increasing at each refresh.


Zombies from Capcom's Ghosts N Goblins game

It turns out that these zombies are merely an artifact of the current infrastructure. According to the profiling information from my build/test script, the ‘test’ phase always seems to take 71 seconds to execute, give or take a second, regardless of platform. Incidentally, there are presently 71 active tests in the FATE suite. This led me to recognize that the build/test script is comically inefficient in this respect and that it should be possible to blaze through the tests much more quickly, and perhaps during the build phase as well, provided that not much has changed in the source (the build machines leverage ccache).

At issue is the way in which the script runs commands. It uses the Python subprocess module to spin off a process, monitor the stdout and stderr on separate pipes, and also kill the process if it runs too long. The upshot of the current method is that the script always waits at least 1 second before first checking if the child has finished. This leads to the zombies since the child FFmpeg process has finished but is waiting for its parent to wait for its final status code. I am working on revising this algorithm to be considerably more efficient, particularly since I anticipate eventually having many hundreds of individual tests in the suite.

Here’s another curious artifact I have observed regarding profiling. Python’s os module provides a nifty times() function that returns a bunch of useful timing data. Among the 5 values returned is the cumulative time that child processes of the main process have spent running on the CPU. I thought this would be perfect for profiling since it only accounts for CPU runtime, no I/O time. In reality, I am thinking that the OS simply counts the number of times that a process gets to run on the CPU and multiplies that by 10 milliseconds. At least, empiricial evidence suggests that to be the case since every test seems to complete in a time evenly divisible by 10 msec. I suppose this is good enough for the time being. Fortunately, there are some tests that run long enough for substantial differences to be observed between platforms. For example, the test designated h264-conformance-ba1_ft_c takes on the order of 1280 ms on PPC, 160 ms on x86_32, and 400 ms on x86_64 (all with gcc 4.2.2 compilation on Linux). Of course, those numbers should not be compared with each other, but with the same test run over time on the same CPU.

I’m open to more profiling ideas. Perhaps FFmpeg could include new command line options for fine grain testing of certain modules, or come with separate test programs to achieve the same. E.g., push a few hundred test vectors through DCT/IDCT and log the nominal timing from the timestamp counter for later graphing. For all I know, FFmpeg already has some options to achieve this (usually when I propose a new FFmpeg testing feature to Michael, he helpfully advises that said feature has been in the codebase for years).