Yearly Archives: 2008

Good FATE Housekeeping

The front page of FATE may not look all that different, save for some new tests every few days. But I assure you I am doing a lot of work on the backend to improve the whole thing. I finally deployed the optimizations outlined in this post. That put the brakes on the unrestrained growth of the database (2.5 GB in as many months). I also thought of another –ahem– groundbreaking optimization: If the client-side test script compresses the stdout or stderr for a build record and the compressed size is larger than 64K, there’s really no reason to send it over and store it in the database– the database won’t be able to decompress the data block anyway after it is truncated to 64K. This is the case for every single icc build. Intel’s C compiler insists on operating in a pedantic mode that I can’t seem to disable, despite the options documentation. Fortunately, I have been logging total stderr line count for for several weeks now (though I haven’t made it available via the web interface yet).

So what to do about this enormous database? At first, I suspected that all those specious icc stderr blobs had a significant impact. But no: that’s only occupying 13 MB. Still, that useless data is now gone.

The real space pigs are the cumulative stderr/stdout blobs logged prior to my space-saving epiphany:

mysql> SELECT 
  SUM(LENGTH(stdout)), 
  SUM(LENGTH(stderr)) 
  FROM test_result;
+---------------------+---------------------+
| SUM(LENGTH(stdout)) | SUM(LENGTH(stderr)) |
+---------------------+---------------------+
|          1337031227 |          1096300800 |
+---------------------+---------------------+

Whoa. So that’s where all the space went. That’s right: ~1.3 GB and ~1.1 GB, respectively. Getting rid of the stderr blobs is pretty straightforward. I don’t care about stderr if the test was successful:

mysql> UPDATE test_result
  SET stderr=NULL
  WHERE test_passed=1;

The stdout blobs are a bit trickier. If the test failed, the stdout is always a keeper, just like stderr. If the test succeeded, I still want the stdout data if the corresponding test spec listed the expected stdout as being NULL– that means the stdout is supposed to be retained for logging purpose. This is presently used for tracking the assorted binaries’ filesizes over time and I don’t want to drop that data. So it seems I need data from another table in order to make this query work. Sounds like a job for an JOIN operation, which didn’t seem to be possible per the raw syntax spec. I’m glad I turned out to be wrong as indicated by this blog post. So I cloned a small test table and went to work prototyping it backwards and forwards, lest I lose 3 months of precious filesize data.

mysql> UPDATE test_result result 
  INNER JOIN test_spec spec
  ON result.test_spec=spec.id
  SET result.stdout=NULL 
  WHERE result.test_passed=1
  AND spec.expected_stdout IS NOT NULL;

Things are a bit more manageable now and I’m happy to report that I have a fighting chance of implementing a regular, automated backup schedule for FATE:

+---------------------+---------------------+
| SUM(LENGTH(stdout)) | SUM(LENGTH(stderr)) |
+---------------------+---------------------+
|             2046425 |             4292739 |
+---------------------+---------------------+

I hope you have enjoyed yet another installment of The New Adventures Of The Amazingly Inept DBA.

2008 GSoC Qualification Showcase #1

The application process for Google’s Summer of Code 2008 season is not even open yet, but people interested in participating with FFmpeg are already busy on their qualification tasks.

Ramiro Polla submitted a system that can demux captured MSN video streams and play them back using a video codec known as Mimic. Here is the system in action:

Sascha Sommer has completed a playback system for the RL2 file format, which figured heavily into a CD-ROM game called Voyeur:

There are still some qualification tasks left unclaimed on the FFmpeg GSoC page. And if we run out, we can make more, so don’t despair.

GSoC Multimedia Competition

The 2008 Google Summer of Code participating organizations have been selected. Like last year, I wanted to survey what other multimedia-type projects are doing. Fortunately, instead of clicking on each individual project in the official Google listing to figure out which ones might be vaguely multimedia-related, the GenMAPP project (also a participating organization) has organized the various projects by category.

Multimedia category projects include BBC Research, FFmpeg, GStreamer, Neuros Technology, XBMC, and Xiph.org. Tangential to multimedia are the 2 TV & Video category projects, Dirac Schroedinger (separate from BBC) and VideoLan.

Check out XBMC’s SoC project page. If you have been active with FFmpeg’s own SoC page, it should seem charmingly familiar. Eh, it’s all GFDL, I guess.

Then there is the Audio & Music category: Atheme, Audacity, CLAM, Mixxx, and XMMS2. I hadn’t heard of Atheme before. I can’t quite nail down what it is they do, but they seem to have a number of ambitious software projects under their umbrella. And one of their proposed SoC endeavors is “Support for RealAudio: Implement an input plugin for the RealAudio codec, preferably with support for streaming as well as files.” Seems a bit understated.

These are some other projects that caught my eye at a cursory glance:

Our VideoLAN brothers are also sponsoring projects for x264 improvements.
Audacity proposes a project for better FFmpeg integration.
Schroedinger proposes a project to implement Dirac in Java.
Xiph proposes several projects that directly improve upon FFmpeg itself, including fixing my mistakes in the existing VP3/Theora decoder.
The Gnash project is not directly involved but seems to be an adjunct of at least 3 projects that I have found: Neuros Technology, GNU, and OLPC.

Speaking of Neuros Technology, this is the first time I have heard of them. They produce an open platform as a digital media center. GSoC participants will receive one of the items. Tantalizing. No such perks for working on FFmpeg. But I would like to remind prospective GSoC participants that FFmpeg offers valuable real world experience in the form of working long, thankless hours for a set of abusive, anti-social, impossible-to-please bosses on a rarely acknowledged piece of backend software. This is training you don’t get in school.

The application process begins bright and early on Monday morning (March 24). And don’t forget your qualification task.

Feigning Transactions

There is some collateral damage showing up on FATE due to that indexing solution deployed last week, a side effect that should also be obvious to anyone with one or two computer science courses behind them: Indexing a column of a table means that the index must be updated during each insert which makes the insert operation slower. The net effect is that it increases the opportunity that the FATE page cache generator might erroneously report too few tests corresponding to a build record. A previously yellow box (meaning that one or more tests failed) is green but closer inspection reveals only 21/21 tests succeeded.

What is happening is that, when a FATE build/test cycle is complete for a configuration, the script enters a new build record and, if the build succeeded, proceeds to enter a new test result record for each of the (currently) 111 tests. When the page cache update script kicks in on its usual 15-minute interval, a client script might still be inserting build records, leading to a race condition. I mitigate this possibility with the following algorithm:

  query the test result set corresponding to the a build record
  current_count = items in test result set
  while (1):
    query test results again
    new_count = items in test result set
    if current_count == new_count:
      break
    else:
      current_count = new_count
      wait 4 seconds and try again (up to 10 times, then quit the script;
        it'll try from cron again anyway)

The heuristic actually works quite well. However, sometimes the server is extremely bogged down for one reason or another and the insert operations are occurring more than 4 seconds apart, or perhaps the client lost connection before it could enter all the test results.

The proper solution to this would be database transactions. MySQL is not renowned for its transactional support. True, version 5 is supposed to support them, and I am on version 5. But it requires special configuration that I don’t know how to perform and am not sure if I even have the access privileges to set up. But I have determined empirically that transactions are not supported with my current database and configuration (method: create a new table; insert a new record; start a transaction; update the record; query the record from a different session; the record has been updated, ergo, the START TRANSACTION was not honored).

Idea: Feign a transaction by adding a new field to the build record table, a boolean flag to indicate if the record is complete. When the build record is first entered, the flag is cleared. Only after all the test results have been entered will the flag be manually set true. Using this method, FATE will easily be able to find build records that were completed. This has the downside of leaving specious, “zombie” data in the tables and I will probably need to create a process for periodically cleaning the cruft in lieu of proper transaction/rollback support.

A perfect hack solution, I hope. We make do with what we have because MySQL is just so fast and free.

Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering

Yearly Archives: 2008

Good FATE Housekeeping

2008 GSoC Qualification Showcase #1

GSoC Multimedia Competition

Feigning Transactions