Tag Archives: xiph

The Standard, Like It Or Not

I have been studying multimedia technology since 2000. It has been a pretty chaotic technological landscape. People who wanted to publish video on the web wondered what format to use (and occasionally sought my advice). Various fiefdoms arose around Microsoft, Apple, and Real, all hoping to claim the mantle of the standard web video format. Somewhere along the line, Macromedia “accidentally” established a standard web video format via the Flash Player (now Adobe’s).

A few years ago, Adobe (my employer, BTW) upgraded the video support in Flash Player to use the same video format that happened to sit at the top of QuickTime’s codec heap: QT-MP4/H.264/AAC. A few days ago, Microsoft announced the beta of Silverlight 3, which contains support for the same formats. After absorbing that information, it took a few days for the next thought to coalesce in my mind:

We have a standard multimedia format.

All the big players support the same multimedia stack (I think even Real Player supports the same stack). I know that’s dismaying to certain elements of the free software community who insist that Xiph’s multimedia stack is the “standard” (really! there are blessed RFCs to back it up and everything); you may not like it, but that’s the way it is:

  • QT-MP4 is the standard container format, not Ogg
  • H.264 is the standard video codec, not Theora (or dirac)
  • AAC (and also MP3, for historical purposes) is the standard audio codec, not Vorbis

Sure, you may, in principal, have to send a dollar or 2 over to the Patent Illuminati (though highly unlikely). But it’s either that, or, you know, not have a standard video format. (And remember, the HTML5 video tag is not coming to save you.)

At least the free software enthusiast can take comfort in knowing that open source (L/GPL) efforts such as FFmpeg and x264 aim to create the very best tools that anyone can possibly use to create these formats.

Addendum: Now that I think about it, I don’t necessarily know if Silverlight 3 will transport H.264 and AAC inside of a QT-MP4 container or somehow pack it into an ASF file. That would be interesting to find out, though I have read (possibly uninformed) blog chatter excited about being able to stream the same file through Flash and Silverlight.


Every so often, it’s a good idea to surf over to Xiph’s site to see if they have absorbed any other well-meaning multimedia-related free software projects. I’m not sure if XSPF started out as a separate effort, but it’s underneath the Xiph umbrella now. The project is billed as “the XML format for sharing playlists.” Yippee. To continue the “those who can, do…” series: Those who can, do; those who can’t, create metadata formats. Anyway, all the buzzwords are there: XML, open, portable, simple, XML. I’m surprised that it’s not an RFC yet (that I could find). I’m sure it’s only a matter of time. Going forward, all of the free multimedia players will be morally obligated to support XSPF. Be advised.

Maybe I’m just irritated because XSPF is supposed to be pronounced “spiff” which, to me, defiles the memory of Calvin.

I think those 3 letters — XML — put me off of this idea the most. Every now and then, I have entertained the idea of using XML to store or transport data for my own programs. But then I realize that I may as well just use an arbitrary binary format that is easier to parse. After all, isn’t XML just an arbitrary textual format? Actually, no. Arbitrary textual data would be easier to parse (e.g., records of data separated by carriage returns with individual fields separated by commas or some other character guaranteed not to occur in the regular data; i.e., CSV). XML requires strict structure around the arbitrary textual data.

As my esteemed multimedia hacking colleague, Attila Kinali, once articulated, “if you really think that XML is the answer, then you definitely misunderstood the question.” (Attila: Michael and I and the rest of the gang are going to make sure that quote is what you’re best known for.)

I think that XML is intuitively antithetical in the mind of the average multimedia hacker. Such an individual instinctively attempts to encode things with as few bits as possible for the express purpose of making transport or storage of that data more efficient. XML explicitly defies that notion by representing information with way more bits than are necessary.

Suddenly, I find myself wondering about representing DCT coefficient data using an XML schema — why not express a JPEG as a human-readable XML file?

< ?xml version="1.0"?>


Don’t laugh — it would be extensible. Someone could, for example, add markup to individual macroblocks. Is it anymore outlandish than, say, specifying vector drawings as XML?