March 31st, 2008 by
Multimedia Mike
Today was supposed to be the the last day that students could apply to participate in Google’s 2008 Summer of Code. That deadline, however, along with all of the program’s deadlines, have been pushed back by a week. The new submission deadline is next Monday, April 7. The new deadline for the FFmpeg mentoring crew to pick students in April 18, so be sure to have your qualification code in SVN well before then in order to be considered for a project slot.
Regarding qualification tasks, 2 more students have qualified. First, the more graphically interesting: Eli Friedman completed an RPL/ARMovie demuxer and Escape-124 video decoder:

This is the container/codec combo that was seen in the first Tomb Raider PC game. ARMovie is a format that was apparently used on Acorn RISC PCs. If anyone happens to have any ARMovie files that don’t come from Eidos games, I would be interested it seeing them.
Next is a demuxer/decoder pair for the Amiga IFF format that supports the PCM and Fibonacci encodings, prepared by Jai Menon. No real graphics of course, so here’s ffplay’s graphical interpretation of the PCM amplitude:

There is still time to both apply for GSoC 2008 and work on a qualification task. I have been scavenging the MultimediaWiki for game-related formats that have long been documented but not re-implemented in FFmpeg. Hopefully, the supply will hold up for all the students who want to try their hand at FFmpeg.
Posted in Open Source Multimedia |
7 Comments »
March 26th, 2008 by
Multimedia Mike
The front page of FATE may not look all that different, save for some new tests every few days. But I assure you I am doing a lot of work on the backend to improve the whole thing. I finally deployed the optimizations outlined in this post. That put the brakes on the unrestrained growth of the database (2.5 GB in as many months). I also thought of another --ahem-- groundbreaking optimization: If the client-side test script compresses the stdout or stderr for a build record and the compressed size is larger than 64K, there's really no reason to send it over and store it in the database-- the database won't be able to decompress the data block anyway after it is truncated to 64K. This is the case for every single icc build. Intel's C compiler insists on operating in a pedantic mode that I can't seem to disable, despite the options documentation. Fortunately, I have been logging total stderr line count for for several weeks now (though I haven't made it available via the web interface yet).
So what to do about this enormous database? At first, I suspected that all those specious icc stderr blobs had a significant impact. But no: that's only occupying 13 MB. Still, that useless data is now gone.
The real space pigs are the cumulative stderr/stdout blobs logged prior to my space-saving epiphany:
SQL:
-
mysql> SELECT
-
SUM(LENGTH(stdout)),
-
SUM(LENGTH(stderr))
-
FROM test_result;
-
+---------------------+---------------------+
-
| SUM(LENGTH(stdout)) | SUM(LENGTH(stderr)) |
-
+---------------------+---------------------+
-
| 1337031227 | 1096300800 |
-
+---------------------+---------------------+
Whoa. So that's where all the space went. That's right: ~1.3 GB and ~1.1 GB, respectively. Getting rid of the stderr blobs is pretty straightforward. I don't care about stderr if the test was successful:
SQL:
-
mysql> UPDATE test_result
-
SET stderr=NULL
-
WHERE test_passed=1;
The stdout blobs are a bit trickier. If the test failed, the stdout is always a keeper, just like stderr. If the test succeeded, I still want the stdout data if the corresponding test spec listed the expected stdout as being NULL-- that means the stdout is supposed to be retained for logging purpose. This is presently used for tracking the assorted binaries' filesizes over time and I don't want to drop that data. So it seems I need data from another table in order to make this query work. Sounds like a job for an JOIN operation, which didn't seem to be possible per the raw syntax spec. I'm glad I turned out to be wrong as indicated by this blog post. So I cloned a small test table and went to work prototyping it backwards and forwards, lest I lose 3 months of precious filesize data.
SQL:
-
mysql> UPDATE test_result result
-
INNER JOIN test_spec spec
-
ON result.test_spec=spec.id
-
SET result.stdout=NULL
-
WHERE result.test_passed=1
-
AND spec.expected_stdout IS NOT NULL;
Things are a bit more manageable now and I'm happy to report that I have a fighting chance of implementing a regular, automated backup schedule for FATE:
SQL:
-
+---------------------+---------------------+
-
| SUM(LENGTH(stdout)) | SUM(LENGTH(stderr)) |
-
+---------------------+---------------------+
-
| 2046425 | 4292739 |
-
+---------------------+---------------------+
I hope you have enjoyed yet another installment of The New Adventures Of The Amazingly Inept DBA.
Posted in FATE Server |
No Comments »
March 22nd, 2008 by
Multimedia Mike
The application process for Google's Summer of Code 2008 season is not even open yet, but people interested in participating with FFmpeg are already busy on their qualification tasks.
Ramiro Polla submitted a system that can demux captured MSN video streams and play them back using a video codec known as Mimic. Here is the system in action:
![I [heart] FFmpeg](/eggs/images/i-heart-ffmpeg.jpg)
Sascha Sommer has completed a playback system for the RL2 file format, which figured heavily into a CD-ROM game called Voyeur:

There are still some qualification tasks left unclaimed on the FFmpeg GSoC page. And if we run out, we can make more, so don't despair.
Posted in Open Source Multimedia |
3 Comments »
March 21st, 2008 by
Multimedia Mike
The 2008 Google Summer of Code participating organizations have been selected. Like last year, I wanted to survey what other multimedia-type projects are doing. Fortunately, instead of clicking on each individual project in the official Google listing to figure out which ones might be vaguely multimedia-related, the GenMAPP project (also a participating organization) has organized the various projects by category.
Multimedia category projects include BBC Research, FFmpeg, GStreamer, Neuros Technology, XBMC, and Xiph.org. Tangential to multimedia are the 2 TV & Video category projects, Dirac Schroedinger (separate from BBC) and VideoLan.
Check out XBMC's SoC project page. If you have been active with FFmpeg's own SoC page, it should seem charmingly familiar. Eh, it's all GFDL, I guess.
Then there is the Audio & Music category: Atheme, Audacity, CLAM, Mixxx, and XMMS2. I hadn't heard of Atheme before. I can't quite nail down what it is they do, but they seem to have a number of ambitious software projects under their umbrella. And one of their proposed SoC endeavors is "Support for RealAudio: Implement an input plugin for the RealAudio codec, preferably with support for streaming as well as files." Seems a bit understated.
These are some other projects that caught my eye at a cursory glance:
Speaking of Neuros Technology, this is the first time I have heard of them. They produce an open platform as a digital media center. GSoC participants will receive one of the items. Tantalizing. No such perks for working on FFmpeg. But I would like to remind prospective GSoC participants that FFmpeg offers valuable real world experience in the form of working long, thankless hours for a set of abusive, anti-social, impossible-to-please bosses on a rarely acknowledged piece of backend software. This is training you don't get in school.
The application process begins bright and early on Monday morning (March 24). And don't forget your qualification task.
Posted in Open Source Multimedia |
3 Comments »
March 20th, 2008 by
Multimedia Mike
There is some collateral damage showing up on FATE due to that indexing solution deployed last week, a side effect that should also be obvious to anyone with one or two computer science courses behind them: Indexing a column of a table means that the index must be updated during each insert which makes the insert operation slower. The net effect is that it increases the opportunity that the FATE page cache generator might erroneously report too few tests corresponding to a build record. A previously yellow box (meaning that one or more tests failed) is green but closer inspection reveals only 21/21 tests succeeded.
What is happening is that, when a FATE build/test cycle is complete for a configuration, the script enters a new build record and, if the build succeeded, proceeds to enter a new test result record for each of the (currently) 111 tests. When the page cache update script kicks in on its usual 15-minute interval, a client script might still be inserting build records, leading to a race condition. I mitigate this possibility with the following algorithm:
query the test result set corresponding to the a build record
current_count = items in test result set
while (1):
query test results again
new_count = items in test result set
if current_count == new_count:
break
else:
current_count = new_count
wait 4 seconds and try again (up to 10 times, then quit the script;
it'll try from cron again anyway)
The heuristic actually works quite well. However, sometimes the server is extremely bogged down for one reason or another and the insert operations are occurring more than 4 seconds apart, or perhaps the client lost connection before it could enter all the test results.
The proper solution to this would be database transactions. MySQL is not renowned for its transactional support. True, version 5 is supposed to support them, and I am on version 5. But it requires special configuration that I don't know how to perform and am not sure if I even have the access privileges to set up. But I have determined empirically that transactions are not supported with my current database and configuration (method: create a new table; insert a new record; start a transaction; update the record; query the record from a different session; the record has been updated, ergo, the START TRANSACTION was not honored).
Idea: Feign a transaction by adding a new field to the build record table, a boolean flag to indicate if the record is complete. When the build record is first entered, the flag is cleared. Only after all the test results have been entered will the flag be manually set true. Using this method, FATE will easily be able to find build records that were completed. This has the downside of leaving specious, "zombie" data in the tables and I will probably need to create a process for periodically cleaning the cruft in lieu of proper transaction/rollback support.
A perfect hack solution, I hope. We make do with what we have because MySQL is just so fast and free.
Posted in FATE Server |
5 Comments »
March 17th, 2008 by
Multimedia Mike
The pilot for an American TV show called Greek was a free download on Apple iTunes recently. I'm just as eager as the next open source software developer to brainlessly give a try to free stuff, so I checked it out. The show centers around some participants in the Greek-lettered fraternity and sorority system present on many college campuses. Hazing plays a role.

This caused me to consider FFmpeg and the Google Summer of Code in the context of fraternities. GSoC is a college activity, like the Greek system. Participation might help your career along, post-school (an alleged rationale for joining a fraternity). And if you want to be initiated into the FFmpeg brotherhood, you are required to submit to a ritual known as the qualification task.
This would be a good time to mention that FFmpeg has been accepted into the GSoC for a third year in a row. Students who have any interest in working on a summer FFmpeg project on Google's dime need to make their interest known on the ffmpeg-devel list and publicly claim a qualification hazing ritual.
Also, it seems that the x264 project wants in on some of the GSoC action, as indicated by their new adjunct Wiki page. This only creates ever more exciting opportunities. Wouldn't you like to be a part?
Posted in Open Source Multimedia |
8 Comments »
March 14th, 2008 by
Multimedia Mike
I have a tendency to regard a database engine as a black box. I just formulate my queries and count on the engine to make them fast, somehow. I think this is similar to the faith that people tend to place in language compilers-- just write the C code and the compiler will just magically optimize it. And if the code isn't fast enough, maybe you should use a higher optimization level. Of course, a computer scientist ought to be able to analyze algorithmic running efficiency and spot opportunities for theoretical improvement, rather than relying on the compiler to insert a faster machine instruction here or there.
I started reading up on MySQL optimization strategies. There are a few things to understand about how the database works under the covers, things that are quite intuitive to anyone who has a semester of data structures and algorithms coursework. The FATE database is getting slower as it grows larger. The table growing the fastest is test_result. Each build currently generates 111 new rows, one for each active test specification.
SQL:
-
mysql> SELECT COUNT(test_spec)
-
FROM test_result
-
WHERE build_record=6742;
-
+------------------+
-
| COUNT(test_spec) |
-
+------------------+
-
| 111 |
-
+------------------+
-
1 row IN SET (4.12 sec)
Read the rest of this entry »
Posted in FATE Server |
No Comments »
March 10th, 2008 by
Multimedia Mike
Ma.tt (his actual domain name), the father of the WordPress blogging system, snapped this photo at the SxSW event and it gave me a cold chill for some reason:

I did a little searching and realized that I had already been exposed to the idea that Blu-Ray was colluding with Java. Now it occurs to me to wonder: Has there been demand for free multimedia players to support the Java functionality necessary to play Blu-Ray discs?
Posted in Java |
5 Comments »
March 9th, 2008 by
Multimedia Mike
Current snapshot of the FATE database:
SQL:
-
mysql> SELECT
-
-> COUNT(id) AS "Number of test results"
-
-> FROM test_result;
-
+------------------------+
-
| Number of test results |
-
+------------------------+
-
| 399177 |
-
+------------------------+
And we're just getting started. This might be construed as either long-term planning or silly paranoia, but I have started to wonder what it would take to overflow the id field of the test_result table. I'm not even sure how large it is. MySQL simply reports the database field as being type "int(11)". I have read various bits of literature which do not give a definitive answer on just how many bits that is. Worst case, I am assuming 32 bits, signed, with a useful positive range around 2 billion. Suppose I ramp up to around 500 unique tests in the database (hey, with all the individual regression tests yet to be imported, as well as various official conformance suites, that's actually a fairly conservative estimate) and add 6 more configurations to round out to 20. That means each build/test cycle will generate 500 * 20 = 10000 test results. If there are 10 cycles on an average day, that means 100,000 test results per day and 3 million per month. That would last the 31-bit range for about 715 days, or nearly 2 years.
Okay, I guess I will put off worrying about the implications for the time being. But I still need to revise the test_result table to be more efficient (i.e., quit storing the stdout field if it's the same as was specified in the test specification).
Posted in FATE Server |
4 Comments »
March 6th, 2008 by
Multimedia Mike
FATE has been public for 2 months and I have just now reached 100 tests. It's a nice round number. No slowing down now, though. I hope for that number to go exponential, at least up to the point that FATE carefully tests 98% of FFmpeg's total functionality (the last 2% will be fixing bugs that I am logging as I go).
I have also been seriously looking into turning the Mac Mini into a FATE build/test machine for Mac OS X. I'm just trying to decide if I should rush it and get the configuration onto the farm with the current infrastructure, or use it as an opportunity to revise the architecture with the various efficiency brainstorms plotted on this blog. The refactoring needs to occur before I add too many more tests. For the curious, this is what the FATE script looks like while running in a screen session; it wakes up every 15 minutes and checks for a new revision in Subversion:
[Thu Mar 6 15:08:22 2008] no change
[Thu Mar 6 15:23:26 2008] getting new revision = 12356
[Thu Mar 6 15:23:34 2008] building with gcc svn 132381, built 2008-02-17
[Thu Mar 6 15:25:04 2008] testing...
[Thu Mar 6 15:25:41 2008] logging...
[Thu Mar 6 15:26:17 2008] building with gcc 4.0.4
[Thu Mar 6 15:29:09 2008] testing...
[Thu Mar 6 15:29:42 2008] logging...
[Thu Mar 6 15:30:07 2008] building with gcc 4.1.2
...
Notice the time delta between logging... and the subsequent building... That delta seems to grow more or less linearly as the number of tests increases. That's why I'm interested in optimizing that aspect sooner than later.
Posted in FATE Server |
No Comments »