Yearly Archives: 2008

FFmpeg Hazing Ritual

The pilot for an American TV show called Greek was a free download on Apple iTunes recently. I’m just as eager as the next open source software developer to brainlessly give a try to free stuff, so I checked it out. The show centers around some participants in the Greek-lettered fraternity and sorority system present on many college campuses. Hazing plays a role.


FFmpeg transliterated to Greek alphabet

This caused me to consider FFmpeg and the Google Summer of Code in the context of fraternities. GSoC is a college activity, like the Greek system. Participation might help your career along, post-school (an alleged rationale for joining a fraternity). And if you want to be initiated into the FFmpeg brotherhood, you are required to submit to a ritual known as the qualification task.

This would be a good time to mention that FFmpeg has been accepted into the GSoC for a third year in a row. Students who have any interest in working on a summer FFmpeg project on Google’s dime need to make their interest known on the ffmpeg-devel list and publicly claim a qualification hazing ritual.

Also, it seems that the x264 project wants in on some of the GSoC action, as indicated by their new adjunct Wiki page. This only creates ever more exciting opportunities. Wouldn’t you like to be a part?

An Object Lesson In Database Optimization

I have a tendency to regard a database engine as a black box. I just formulate my queries and count on the engine to make them fast, somehow. I think this is similar to the faith that people tend to place in language compilers– just write the C code and the compiler will just magically optimize it. And if the code isn’t fast enough, maybe you should use a higher optimization level. Of course, a computer scientist ought to be able to analyze algorithmic running efficiency and spot opportunities for theoretical improvement, rather than relying on the compiler to insert a faster machine instruction here or there.

I started reading up on MySQL optimization strategies. There are a few things to understand about how the database works under the covers, things that are quite intuitive to anyone who has a semester of data structures and algorithms coursework. The FATE database is getting slower as it grows larger. The table growing the fastest is test_result. Each build currently generates 111 new rows, one for each active test specification.

mysql> SELECT COUNT(test_spec) 
       FROM test_result 
       WHERE build_record=6742;
+------------------+
| COUNT(test_spec) |
+------------------+
|              111 |
+------------------+
1 row in set (4.12 sec)

Continue reading

Unholy Alliance

Ma.tt (his actual domain name), the father of the WordPress blogging system, snapped this photo at the SxSW event and it gave me a cold chill for some reason:


Blu-Ray/Java Segway

I did a little searching and realized that I had already been exposed to the idea that Blu-Ray was colluding with Java. Now it occurs to me to wonder: Has there been demand for free multimedia players to support the Java functionality necessary to play Blu-Ray discs?

How Many IDs In A Database?

Current snapshot of the FATE database:

And we’re just getting started. This might be construed as either long-term planning or silly paranoia, but I have started to wonder what it would take to overflow the id field of the test_result table. I’m not even sure how large it is. MySQL simply reports the database field as being type “int(11)”. I have read various bits of literature which do not give a definitive answer on just how many bits that is. Worst case, I am assuming 32 bits, signed, with a useful positive range around 2 billion. Suppose I ramp up to around 500 unique tests in the database (hey, with all the individual regression tests yet to be imported, as well as various official conformance suites, that’s actually a fairly conservative estimate) and add 6 more configurations to round out to 20. That means each build/test cycle will generate 500 * 20 = 10000 test results. If there are 10 cycles on an average day, that means 100,000 test results per day and 3 million per month. That would last the 31-bit range for about 715 days, or nearly 2 years.

Okay, I guess I will put off worrying about the implications for the time being. But I still need to revise the test_result table to be more efficient (i.e., quit storing the stdout field if it’s the same as was specified in the test specification).