Monthly Archives: July 2007

Playing Nicely With Containers

As of this writing, there are 25 lossless audio coding (LACs) algorithms cataloged in the MultimediaWiki. Apparently, that’s not enough because an audiophile friend electrical engineer with a solid DSP background (amended per the audiophile’s suggestion) just communicated the news that he is working on a new algorithm.

In particular, he was seeking advice about how to make the codec container-friendly. A pet peeve toward many available LACs is that their designers have historically insisted upon creating custom container formats for storing the compressed data.

Aside: The uninitiated might be wondering why this custom container characteristic irks us multimedia veterans so. Maybe it’s just best to solve one problem at a time: if you want to create a new codec format, work on that. Don’t bother creating a container to store it at the same time. That’s a different problem domain entirely. If you tackle the container problem, you’re likely to make a bunch of common rookie mistakes that will only earn the scorn of open source multimedia hackers who would otherwise like to support your format.

My simple advice for him is to design the codec so that each compressed chunk decompresses to a constant number of samples for a given file. Per my recollection, this is a problem with Vorbis that causes difficulty when embedding it inside of general purpose container formats– a given file can have 2 decoded chunk sizes, e.g., 512 and 8192 samples (I’m sure someone will correct me if I have that fact mixed up). Also, try not to have “too much” out-of-band initialization data, a.k.a. “extradata”. How much is too much? I’m not sure, but I know that there are some limitations somewhere. Again, this is a problem with those saviors of open source multimedia, Vorbis audio and Theora video. Both codecs use the container extradata section to transmit all the entropy models and data tables because the codec designers were unwilling to make hard decisions in the design phase. (Okay, maybe it would be more polite to state that Vorbis and Theora are transparent and democratic in their approach to entropy and quantization models by allowing the user the freedom to choose the most suitable model.)

No OOB setup extradata is ideal, of course. What about the most basic parameters such as sample rate, sample resolution, channel count, decoded block size? Any half-decent general-purpose container format has all that data and more encoded in a standard audio header. This includes AVI, ASF, QuickTime, WAV, and AIFF, at the very least. Perceptual audio codecs like Windows Media Audio and QDesign Music Codec get by with just a few bytes of extradata.

Revenge Of The Autobuilds

Takis has been a busy FFmpeg hacker: He recently established an experimental server to automatically build the current source-controlled copy of FFmpeg and perform some rudimentary tests with the output. This is some great initiative on his part.

(Oh, and look what else Takis has been up to while no one is looking: a graph of FFmpeg code change over time.)

I have wanted to build an automated building and testing infrastructure for FFmpeg for a long time now. I got my first concept up and running late last November. I just realized that I never blogged about it although I did announce it on the ffmpeg-devel mailing list. The concept lives at, though be advised that the script that updates it went offline in late December.

Predictably, people seemed to think the autobuild system was a good idea but that my implementation needed a lot of work. And they were right. The reason that I never blogged about it is likely that I figured I was about to deploy a better concept very soon.

It is now July and I have had months to brainstorm ideas for an improved autobuild and test infrastructure. Unfortunately, as can often happen with revision 2 of an unproven idea, I fear my concept has devolved into an exercise in architecture astronomy.

Architecture Astronomy

Read Joel Spolsky’s excellent essay, “Don’t Let Architecture Astronauts Scare You”. It’s about people who heavily theorize in the abstract but rarely accomplish anything useful. Personally, I consider it a clear indicator of architecture astronomy when a program’s fundamental paradigm revolves around the idea that, “Everything is an object (or module)!” It is my opinion that declaring everything in your architecture to be an object is the abstraction endgame (to be more specific, everything is a swappable, user-configurable module, even the central engine of the program that is supposed to coordinate everything between other modules).

I’ll explain the evolution of my autobuild idea: It started simply enough with a script that iterated through a bunch of compiler versions and ran the configure/make commands to build each. It logged stdout and stderr separately and logged general information about success/failure, SVN version, etc. into a rudimentary database table that could be simply queried with a PHP script.

I soon realized that this is wholly inadequate to the overall goals I wished to accomplish in this endeavor (building and testing on many platforms). Security is a major issue, which I blogged about before, and which I solved in the first iteration using the most paranoid policies of chroot’ing the configure/make steps and prohibiting network access during the process. Another problem is the eventuality of infinite loop bugs. Any build or test step could conceivably encounter such a condition.

This realization led me to redesign the autobuild/test system as a series of individual executable steps, all stored in a database, of which the primary script has no hardcoded knowledge. And this is where the “Everything is a module” philosophy comes into play. Unfortunately, the further I plot this out on paper, the harder it becomes because the execution module concept is too generic; it’s hard to do certain specific things. I realize I need to back off a bit on the abstraction.