Category Archives: Codec Technology

On ALAC’s Open Sourcing

Apple open sourced their lossless audio codec last week. Pretty awesome! I have a theory that, given enough time, absolutely every codec will be open source in one way or another.

I know I shouldn’t bother reading internet conversation around any news related to multimedia technology. And if I do read it, I shouldn’t waste any effort getting annoyed about them. But here are some general corrections:

  • ALAC is not in the same league as — nor is it a suitable replacement for — MP3/AAC/Vorbis or any other commonly used perceptual audio codec. It’s not a matter of better or worse; they’re just different families of codecs designed for different purposes.
  • Apple open sourced ALAC, not AAC– easy mistake, though there’s nothing to ‘open source’ about AAC (though people can, and will, argue about its absolute ‘open-ness’).
  • There’s not much technical room to argue between ALAC and FLAC, the leading open source lossless audio compressor. Both perform similarly in terms of codec speeds (screamingly fast) and compression efficiency (results vary slightly depending on source material).
  • Perhaps the most frustrating facet is the blithe ignorance about ALAC’s current open source status. While this event simply added an official “open source” status to the codec, ALAC has effectively been open source for a very long time. According to my notes, the ALAC decoding algorithm was reverse engineered in 2005 and added into FFmpeg in March of the same year. Then in 2008, Google — through their Summer of Code program — sponsored an open source ALAC encoder.

From the multimedia-savvy who are versed in these concepts, the conversation revolves around which would win in a fight, ALAC or FLAC? And who between Apple and FFmpeg/Libav has a faster ALAC decoder? The faster and more efficient ALAC encoder? I contend that these issues don’t really matter. If you have any experience working with lossless audio encoders, you know that they tend to be ridiculously fast to both encode and decode and that many different lossless codecs compress at roughly the same ratios.

As for which encoder is the fastest: use whatever encoder is handiest and most familiar, either iTunes or FFmpeg/Libav.

As for whether to use FLAC or ALAC — if you’ve already been using one or the other for years, keep on using it. Support isn’t going to vanish. If you’re deciding which to use for a new project, again, perhaps choose based on software you’re already familiar with. Also, consider hardware support– ALAC enjoys iPod support, FLAC is probably better supported in a variety of non-iPod devices, though that may change going forward due to this open sourcing event.

For my part, I’m just ecstatic that the question of moral superiority based on open source status has been removed from the equation.

Code-wise, I’m interested in studying the official ALAC code to see if it has any corner-case modes that the existing open source decoders don’t yet account for. The source makes mention of multichannel (i.e., greater than stereo) configurations, but I don’t know if that’s in FFmpeg/Libav.

More Cinepak Madness

Fellow digital archaeologist Clone2727 found a possible fifth variant of the Cinepak video codec. He asked me if I cared to investigate the sample. I assured him I wouldn’t be able to die a happy multimedia nerd unless I have cataloged all possible Cinepak variants known to exist in the wild. I’m sure there are chemistry nerds out there who are ecstatic when another element is added to the periodic table. Well, that’s me, except with weird multimedia formats.

Background
Cinepak is a video codec that saw widespread use in the early days of digital multimedia. To date, we have cataloged 4 variants of Cinepak in the wild. This distinction is useful when trying to write and maintain an all-in-one decoder. The variants are:

  1. The standard type: Most Cinepak data falls into this category. It decodes to a modified/simplified YUV 4:2:0 planar colorspace and is often seen in AVI and QuickTime/MOV files.
  2. 8-bit greyscale: Essentially the same as the standard type but with only a Y plane. This has only been identified in AVI files and is distinguished by the file header’s video bits/pixel field being set to 8 instead of 24.
  3. 8-bit paletted: Again, this is identified by the video header specifying 8 bits/pixel for a Cinepak stream. There is essentially only a Y plane in the data, however, each 8-bit value is a palette index. The palette is transported along with the video header. To date, only one known sample of this format has even been spotted in the wild, and it’s classified as NSFW. It is also a QuickTime/MOV file.
  4. Sega/FILM CPK data: Sega Saturn games often used CPK files which stored a variant of Cinepak that, while very close the standard Cinepak, couldn’t be decoded with standard decoder components.

So, a flexible Cinepak decoder has to identify if the file’s video header specified 8 bits/pixel. How does it distinguish between greyscale and paletted? If a file is paletted, a custom palette should have been included with the video header. Thus, if video bits/pixel is 8 and a palette is present, use paletted; else, use greyscale. Beyond that, the Cinepak decoder has a heuristic to determine how to handle the standard type of data, which might deviate slightly if it comes from a Sega CPK file.

The Fifth Variant?
Now, regarding this fifth variant– the reason this issue came up is because of that aforementioned heuristic. Basically, a Cinepak chunk is supposed to store the length of the entire chunk in its header. The data from a Sega CPK file plays fast and loose with this chunk size and the discrepancy makes it easy to determine if the data requires special handling. However, a handful of files discovered on a Macintosh game called “The Journeyman Project: Pegasus Prime” have chunk lengths which are sometimes in disagreement with the lengths reported in the containing QuickTime file’s stsz atom. This trips the heuristic and tries to apply the CPK rules against Cinepak data which, aside from the weird chunk length, is perfectly compliant.

Here are the first few chunk sizes, as reported by the file header (stsz atom) and the chunk:

size from stsz = 7880 (0x1EC8); from header = 3940 (0xF64)
size from stsz = 3940 (0xF64); from header = 3940 (0xF64)
size from stsz = 15792 (0x3DB0); from header = 3948 (0xF6C)
size from stsz = 11844 (0x2E44); from header = 3948 (0xF6C)

Hey, there’s a pattern here. If they don’t match, then the stsz size is an even multiple of the chunk size (2x, 3x, or 4x in my observation). I suppose I could revise the heuristic to state that if the stsz size is 2x, 3x, 4x, or equal to the chunk header, qualify it as compliant Cinepak data.

Of course it feels impure, but software engineering is rarely about programmatic purity. A decade of special cases in the FFmpeg / Libav codebases are a testament to that.

What’s A Variant?
Suddenly, I find myself contemplating what truly constitutes a variant. Maybe this was just a broken encoder program making these files? And for that, I assign it the designation of distinct variation, like some sort of special, unique showflake?

Then again, I documented Magic Carpet FLIC as being a distinct variant of the broader FLIC format (which has an enormous number of variants as well).

DCT PR

Some people think that multimedia compression is basically all discrete cosine transform (DCT) and little else.

2 years ago at LinuxTag, I gave a fairly general presentation regarding FFmpeg and open source multimedia hacking (I just noticed that the main page still uses a photo of me and my presentation). I theorized that one problem our little community has when it comes to attracting new multimedia hacking talent is that the field seems scary and mathematically unapproachable. I have this perception that this is what might happen when a curious individual wants to get into multimedia hacking:

I wonder how multimedia compression works?

Well, I’ve heard that everyone uses something called MPEG for multimedia compression.

Further, I have heard something about how MPEG is based around the discrete cosine transform (DCT).

Let’s look up what the DCT is, exactly…


Discrete cosine transform written out on a chalkboard
Clever photo cribbed from a blog actually entitled Discrete Cosine

At which point the prospective contributor screams and runs away from the possibility of ever being productive in the field.

Now, the original talk discussed how that need not be the case, because DCT is really a minor part of multimedia technology overall; how there are lots and lots of diverse problems in the field yet to solve; and how there is room for lots of different types of contributors.

The notion of DCT’s paramount importance in the grand scheme of multimedia compression persists to this day. While reading the HTML5 spec development mailing list, Sylvia Pfeiffer expressed this same line of thinking vis-à-vis Theora:

Even if there is no vendor right now who produces an ASIC for Theora, the components of the Theora codec are not fundamentally different to the components of other DCT based codecs. Therefore, AISCs [sic] that were built for other DCT based codecs may well be adaptable by the ASIC vendor to support Theora.

This prompted me to recall something I read in the MPEG FAQ a long time ago:

MPEG is a DCT based scheme?

The DCT and Huffman algorithms receive the most press coverage (e.g. “MPEG is a DCT based scheme with Huffman coding”), but are in fact less significant when compared to the variety of coding modes signaled to the decoder as context-dependent side information. The MPEG-1 and MPEG-2 IDCT has the same definition as H.261, H.263, JPEG.

A few questions later, the FAQ describes no less than 18 different considerations that help compress video data in MPEG; only the first one deals with transforms. Theora is much the same way. When I wrote the document about Theora’s foundation codec, VP3, I started by listing off all of the coding methods involved: DCT, quantization, run length encoding, zigzag reordering, predictive coding, motion compensation, Huffman entropy coding, and variable length run length Booleans. Theora adds a few more concepts (such as encoding the large amount of stream-specific configuration data).

I used to have the same idea, though: I was one of the first people to download On2’s VpVision package (the release of their VP3 code) and try to understand the algorithm. I remember focusing on the DCT and trying to find DCT-related code, assuming that it was central to the codec. I was surprised and confused to find that a vast amount of logic was devoted to simply reversing DC coefficient prediction. At the end of a huge amount of frame reconstruction code was a small, humble call to an IDCT function.

What I would like to get across here is that Theora is rather different than most video codecs, in just about every way you can name (no, wait: the base quantization matrix for golden frames is the same as the quantization matrix found in JPEG). As for the idea that most DCT-based codecs are all fundamentally the same, ironically, you can’t even count on that with Theora– its DCT is different than the one found in MPEG-1/2/4, H.263, and JPEG (which all use the same DCT). This was likely done in On2’s valiant quest to make everything about the codec just different enough from every other popular codec, which runs quite contrary to the hope that ASIC vendors should be able to simply re-use a bunch of stuff used from other codecs.

Reverse Engineering Math Formulas

Even though I have been studying and working on multimedia technology since 2000 — reverse engineering, documenting, and reimplementing a variety of audio and video codecs — I didn’t actually begin to understand why various algorithms achieved their compression until about 2003. I’m just like that — I study the practice first, and then the underlying theory eventually becomes clear to me (maybe; it has been 9 years and I still couldn’t explain everything about the discrete cosine transform if you asked).

I happened to be looking back over the ZMBV (DOSBox) video codec today. Continue reading