Yearly Archives: 2008

51 H.264 Tests

I finally got the tool outlined in this post operational and doing the right thing. This allowed me to automatically test 136 H.264 draft conformance vectors and then add them to the FATE test suite if they are presently working. Even more useful, I can rerun the same tool at any time and it will skip over any files that already have corresponding tests in the database and re-test samples that weren’t known to work before.

Out of those 136 tests, 51 of them presently work in FFmpeg. “Work” is defined rather strictly as FFmpeg decoding data that matches precisely, bit for bit, to the reference output provided. So at the very least, the FATE Server will be tracking regressions on this set from this point on.

At the time of this writing, the configuration involving gcc 2.95.3 on x86_32 is acting up. I am not sure why, but when the test is done and the results are ready to be logged, the script always stalls while talking to the MySQL server and eventually makes the script bail out. Now that I think about it, I am hypothesizing that perhaps the fact that 2.95.3 compiles FFmpeg with an absolutely inordinate number of warnings (880 Kbytes for 2.95.3 vs. 32 Kbytes for 4.2.2) might be causing trouble. This is a very repeatable failure and does not occur with any other configurations. I don’t blame this on FFmpeg warnings; the FATE tools should be resilient enough to deal with this.

Another curious but minor artifact in the database– there are brief windows in time when I can refresh the main FATE page and see a particular build configuration with a “Tests passed” stat of, say, 16/17. Wow, weird. But when I refresh again immediately, I see the full 55/55 tests passed that I expect for the current state of the database. The problem is pretty straightforward– one of the build machines is in the middle of inserting its results when I happen to refresh the page. For the first time in any database project, I am entertaining the idea of using transactions. Per my reading of the documentation, however, I am not sure if my MySQL installation supports transactions (it requires a certain configuration in the underlying storage tables). Plus, I’m not sure if it’s all that big of a deal, especially since it will stall the database for all other uses in the meantime. Perhaps I’ll just add a static note advising users to reload the page since strange results are probably transient.

Keeping Up To Date With gcc

You might have noticed that the FATE Server includes configurations for building the latest FFmpeg source using the latest SVN builds of the actual compiler, gcc. This was suggested by several people as a way to monitor how the development of such a crucial piece of software effects another crucial piece. But it has also led me to wonder how to keep the gcc-svn version reasonably up to date.

First idea: Pre-built binaries from another source. I know of nothing of the sort, presently.

Next idea: Periodically building the compiler myself. This has a lot of issues, not the least of which is the fact that on both of the current build machines, the compiler takes at least 4 hours to build. And that’s with just ‘–enable-languages=c’, and without any FATE build/test cycles occurring.

Solution: Offload the periodic gcc builds to another machine. I can build the C compiler in just under an hour on a multi-core x86_32 Linux machine, rather than the single-CPU VMware session that currently serves x86_32 build duty on the FATE farm. I have another PowerPC that should also be able to take over building the PPC compiler.

So the next problem becomes: how often to update the gcc SVN compiler in use? Every day? Every 2 days? Every week? I don’t have a good answer for this, but it leads to the next question…

How to keep track of these new gcc SVN builds? Should there be a new configuration for each new SVN build? (A configuration in FATE parlance is a combination of a machine and a compiler version.) Or should I update one master configuration with the latest compiler path and name information (moving from gcc-svnABC-date1 to gcc-svnXYZ-date2)? The former solution would be more pure but the latter might yield superior performance data over an extended period of time. At least, it will once I get more tests into the system, which should happen soon.

The Server of Fate

Pursuant to the last post’s naming contest, SvdB had a novel entry of “FFmpeg Make ‘n’ Break”. However, Kostya’s entry of FATE was destined for victory due to its sheer simplicity. And so it comes to pass:

FATE – FFmpeg Automated Testing Environment

Some may have observed that there still are not very many tests yet. I’m being slow and deliberate with these, at least at the outset. My first impulse was to start manually adding tests to validate a bunch of the fringe formats that I’m most familiar with (since I implemented them), as I have done with this test for the FILM system. However, the guru recommended that I put the H.264 conformance suite to the test.

The base directory has 136 samples. Yeah, I’m leaning towards automated tool on this one.

This FATE project is prompting me to craft a variety of special tools to both make my life easier and ensure fewer errors.I could just make a tool to dump all the samples into the database, pass or fail, and let the test failure count tell the story. However, that might not be useful in the same way that it’s not useful to have hundreds of warnings in a compilation — it distracts from real problems (i.e., we know that 100 or so tests are supposed to fail and we don’t notice when a formerly working test just broke).

I also figured out that it’s not so straightforward to dump all the tests in at once, at least not with correct results. Each archive has, at a minimum, a raw H.264-encoded file and the raw YUV file. A decode of the H.264 file is supposed to be bit exact when compared to the raw file. You can feed the raw YUV image into FFmpeg (and encode to the framecrc target for concise stdout text), but only if you know the file’s resolution. The samples usually have readme files included, and they usually mention the resolution, but I’m not going through that much trouble to pick it out. I’ve already worked out the regexps to figure out what the encoded, raw, and readme files can possibly be named.

So my current plotted strategy works like this; for each .zip file in the conformance suite:

create a short name for the database in the form of, e.g., “h264-conformance-aud_mw_e” for the file AUD_MW_E.zip
query the FATE database to see if a test spec already has that name
if the name is taken, the test is already known to have been working in FFmpeg, skip to next file
unzip the archive
find the encoded, raw, and readme files
using the latest build of ‘ffmpeg’, decode the encoded file: ‘ffmpeg -i -f h264 encoded_file decoded.yuv’
run ‘diff –brief’ against decoded.yuv and the expected output
if the files are identical, craft a new test spec using the readme file for much of the description, and set the expected stdout text to the output of ‘ffmpeg -i -f h264 encoded_file -f framecrc -‘
delete files and move on to next archive

That’s the basic idea. Oh yeah, and general sanity considerations, like testing this on a throwaway table first. The point of building the script this way is to make it easy to re-run it again as H.264 fixes are introduced, and add the newly working tests to the test suite that will be run on each build. Currently, 51/136 of the conformance vectors decode in a bit exact manner.

This will be good practice for when it’s time to add conformance suites such as AAC where there is an added challenge that the output will not necessarily be bit exact.

Catchy Name

I’ve been working hard on the FFmpeg automated build/test server in the last few weeks. I’m planning new test configurations, plotting the specs for hundreds of different automated tests, and stabilizing the general infrastructure. But I’m missing something key– I don’t want to keep having to refer to it as the FFmpeg automated build/test server. I need a catchy name for it.

Any ideas? I realize that free software types are not the most creative lot, but it’s worth throwing the question out there anyway.

Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering

Yearly Archives: 2008

51 H.264 Tests

Keeping Up To Date With gcc

The Server of Fate

Catchy Name