Category Archives: Open Source Multimedia

News regarding open source multimedia projects.

Improving qt-faststart

As this seems to be the homepage of ‘qt-faststart’ according to Google, here are some other useful links:

Moving right along…

It’s weird to think that I may have already written the most popular piece of free software that I will ever program– qt-faststart. It’s easily the most ported of all my programs– at the very least, there are native versions for Mac OS X (Cocoa wrapper) and Adobe AIR.


Classic Apple QuickTime logo

All that qt-faststart does is take an Apple QuickTime file that looks like this:

  [ 'mdat' QT data atom ]    [ 'moov' QT info atom ]

and rearranges it to look like this:

  [ 'moov' QT info atom ]    [ 'mdat' QT data atom ]

Why is this even necessary? In order to stream a QT file via HTTP, the moov atom needs to appear before the mdat atom. So why not write it that way when it is created? That’s tricky to do without 2 passes through the file. Most QT muxers only do the first pass. The qt-faststart utility does the second pass.

I have long maintained that the biggest flaw with Apple’s QuickTime file format is that chunk offsets are specified as absolute file offsets. If they were specified as offsets relative to the start of their mdat chunk, the moov chunk would be easily relocatable.

Are there ways to improve qt-faststart? One thing that would probably be nifty, and perhaps considered standard in this day and age, would be to compress the moov atom. moov atom compression has been around for a long time and operates with the standard zlib algorithm. It seems straightforward enough– take the moov atom from the end of the file, patch the offsets, compress the atom, and put it at the front. But wait– the chunk offsets will all be different since the moov atom is compressed. So, how about compressing the moov atom, assessing the size of the compressed moov atom, patching all of the moov atom’s chunk offsets, and then compressing? But… that would alter the size of the compressed atom since the data has changed. But it probably does not change much. I suspect this is why QuickTime specifies such atom types as ‘free’, ‘junk’, and ‘skip’. These are empty space atoms. Thus, the revised qt-faststart algorithm would probably look like this:

  • Find and load moov atom at end of QT file.
  • Compress moov atom to obtain rough size.
  • Declare moov atom size to be compressed size + factor n.
  • Patch moov atom’s chunk offset atoms to reflect calculated size.
  • Compress and write moov atom.
  • Write empty space atom with the difference between estimated and actual compressed sizes.
  • Copy remainder of QT file.

I wonder what factor n should be? I will probably have to determine this empirically. Or rather than a fixed number or percentage, I wonder if there should be an iterative process for converging on a somewhat optimal compressed size? Not that it matters a great deal; the size of the moov atom generally pales in comparson to the size of the actual multimedia data. It makes one ponder why the moov atom can be compressed in the first place. Alex once proposed that it may provide a layer of data integrity. According to the zlib tech notes, CRCs are used above the compression.

One bit of trivia about the program: qt-faststart does not take into account compressed moov atoms to begin with. I had always considered this a TODO item. However, it has occurred to me that compressed moov atoms probably only ever occur at the beginning of a file. At the very least, I have never received a complaint about qt-faststart being unable to process compressed moov atoms at the end of a file.

Electronic Lossless Viking Arts

I’m technically off the ffmpeg-devel list right now. A few days ago, my ISP started having trouble delivering email between my mail server and the list server. I’m still trying to resolve the problem. Wouldn’t you know, some interesting stuff has been going down:

GSoC 2007 Wrap Up

Thanks to Michael for his Google Summer of Code 2007 wrap up. Technically, I was the FFmpeg mentor administrator. But Michael kept a keen eye on all 8 of our projects. Thanks to all the students who came through with the code this year. FFmpeg is building a good track record which should help our prospects in future GSoC efforts.

Special notices go out to my student, Kostya, who had to pick up my slack in figuring out the RV40 algorithm while at the same time re-implementing it; and to Marco Gerards, for writing both a Dirac decoder and the beginnings of a Dirac encoder, despite what certain GStreamer naysayers had to say about the initiative:

Considering how long we have spent on getting Schrodinger up to scratch I am amazed that anyone even manage to believe they can create both a decoder and an encoder for Dirac in 3 Months.

Go Marco!

Playing Nicely With Containers

As of this writing, there are 25 lossless audio coding (LACs) algorithms cataloged in the MultimediaWiki. Apparently, that’s not enough because an audiophile friend electrical engineer with a solid DSP background (amended per the audiophile’s suggestion) just communicated the news that he is working on a new algorithm.

In particular, he was seeking advice about how to make the codec container-friendly. A pet peeve toward many available LACs is that their designers have historically insisted upon creating custom container formats for storing the compressed data.

Aside: The uninitiated might be wondering why this custom container characteristic irks us multimedia veterans so. Maybe it’s just best to solve one problem at a time: if you want to create a new codec format, work on that. Don’t bother creating a container to store it at the same time. That’s a different problem domain entirely. If you tackle the container problem, you’re likely to make a bunch of common rookie mistakes that will only earn the scorn of open source multimedia hackers who would otherwise like to support your format.

My simple advice for him is to design the codec so that each compressed chunk decompresses to a constant number of samples for a given file. Per my recollection, this is a problem with Vorbis that causes difficulty when embedding it inside of general purpose container formats– a given file can have 2 decoded chunk sizes, e.g., 512 and 8192 samples (I’m sure someone will correct me if I have that fact mixed up). Also, try not to have “too much” out-of-band initialization data, a.k.a. “extradata”. How much is too much? I’m not sure, but I know that there are some limitations somewhere. Again, this is a problem with those saviors of open source multimedia, Vorbis audio and Theora video. Both codecs use the container extradata section to transmit all the entropy models and data tables because the codec designers were unwilling to make hard decisions in the design phase. (Okay, maybe it would be more polite to state that Vorbis and Theora are transparent and democratic in their approach to entropy and quantization models by allowing the user the freedom to choose the most suitable model.)

No OOB setup extradata is ideal, of course. What about the most basic parameters such as sample rate, sample resolution, channel count, decoded block size? Any half-decent general-purpose container format has all that data and more encoded in a standard audio header. This includes AVI, ASF, QuickTime, WAV, and AIFF, at the very least. Perceptual audio codecs like Windows Media Audio and QDesign Music Codec get by with just a few bytes of extradata.