Playing Nicely With Containers

As of this writing, there are 25 lossless audio coding (LACs) algorithms cataloged in the MultimediaWiki. Apparently, that’s not enough because an audiophile friend electrical engineer with a solid DSP background (amended per the audiophile’s suggestion) just communicated the news that he is working on a new algorithm.

In particular, he was seeking advice about how to make the codec container-friendly. A pet peeve toward many available LACs is that their designers have historically insisted upon creating custom container formats for storing the compressed data.

Aside: The uninitiated might be wondering why this custom container characteristic irks us multimedia veterans so. Maybe it’s just best to solve one problem at a time: if you want to create a new codec format, work on that. Don’t bother creating a container to store it at the same time. That’s a different problem domain entirely. If you tackle the container problem, you’re likely to make a bunch of common rookie mistakes that will only earn the scorn of open source multimedia hackers who would otherwise like to support your format.

My simple advice for him is to design the codec so that each compressed chunk decompresses to a constant number of samples for a given file. Per my recollection, this is a problem with Vorbis that causes difficulty when embedding it inside of general purpose container formats– a given file can have 2 decoded chunk sizes, e.g., 512 and 8192 samples (I’m sure someone will correct me if I have that fact mixed up). Also, try not to have “too much” out-of-band initialization data, a.k.a. “extradata”. How much is too much? I’m not sure, but I know that there are some limitations somewhere. Again, this is a problem with those saviors of open source multimedia, Vorbis audio and Theora video. Both codecs use the container extradata section to transmit all the entropy models and data tables because the codec designers were unwilling to make hard decisions in the design phase. (Okay, maybe it would be more polite to state that Vorbis and Theora are transparent and democratic in their approach to entropy and quantization models by allowing the user the freedom to choose the most suitable model.)

No OOB setup extradata is ideal, of course. What about the most basic parameters such as sample rate, sample resolution, channel count, decoded block size? Any half-decent general-purpose container format has all that data and more encoded in a standard audio header. This includes AVI, ASF, QuickTime, WAV, and AIFF, at the very least. Perceptual audio codecs like Windows Media Audio and QDesign Music Codec get by with just a few bytes of extradata.

2 thoughts on “Playing Nicely With Containers

  1. Alex

    Like Vorbis, AAC also has sever different numbers of samples per chunk. AAC-LC only supports 1024. But LD adds support for 960 and mix in SBR (HE-AAC) and you also have 1920 and 2048.

  2. Pengvado

    @Mike
    The problem with Vorbis extradata isn’t the amount of data, it’s that it comes in multiple packets, while most containers support only one extradata packet.
    H.264 extradata also comes in multiple packets, but it doesn’t run into the same problem because the standard defines a way for the decoder to separate the packets if they’re concatenated.

    @Alex
    No one said that an audio codec shouldn’t support multiple chunk sizes. But within any one stream, it should only use one chunk size.
    Does AAC support both SBR and non-SBR chunks in the same stream?

Comments are closed.