VQ Case Study: Cinepak

Cinepak is a true classic among video codecs. It saw considerable use in the early days of FMV as it was easily encapsulated in both AVI and QuickTime files, the prevailing container formats in the early days of PC multimedia. It was also the standard FMV format on early CD-based consoles such as the Sega Saturn and Atari Jaguar.

Like many vector quantizers, Cinepak was prevalent in the days of relatively slow PCs. While Cinepak operates natively in a YUV 4:2:0 space, the colorspace is slightly modified so that it’s incredibly fast to convert to RGB data. The YUV->RGB conversion formulas involve multiplications and divisions only by 2 which are speedily accomplished via bit shifts.

Cinepak transports vector codebooks within the encoded stream. Individual vectors are 2×2-pixel blocks which are comprised for 4 Y samples, 1 U sample, and 1 V sample. Subsequent frames are free to update the codebook with new vectors. Each sample is 1 byte so each vector is 6 bytes.

Cinepak frames carry data about how to tile the 2×2 pixel vectors on a decoded frame. The codec iterates over 4×4 blocks. The encoded stream can specify that a 4×4 block should be painted using a double-sized 2×2 vector, or that it should be painted with 4 individual 2×2 vectors. Further, if the frame is an interframe, the stream can specify to the decoder that the 4×4 block is unchanged from the previous frame and should be skipped. Another feature of the codec is that frames can specify that only certain regions (“strips”) of the frame should be updated, though this feature has rarely been used in practice.

Let’s examine how the design decisions impact performance. First, there’s the colorspace trade-off and the magnificent speed by which it can be converted to RGB. This was a big factor in the mid-90s since common graphics hardware did not accept YUV data for display, and regular integer multiplications were awfully slow. Further, the codebooks are stored in such a way that they can be converted once to whatever the target colorspace (RGB15, RGB16, RGB24, RGB32) and that was the last the decoder needed to worry about it (vs., for example, having to decode an entire frame to YUV and then convert the whole thing to RGB). Since the only special feature about interframes was that an unchanged block could be skipped, decoding only requires a single buffer.

Cinepak also supports a mode where each vector is comprised of 4 bytes instead of 6. What this means seems to depend on the container format. When encapsulated in an AVI file, the 4 samples are grayscale values (and the video, by extension, is grayscale). This is generally the case in QuickTime movies as well. However, there is one known QuickTime/Cinepak sample that carries a 256-color palette and the 4 bytes of each vector are to be construed as indices into the palette. I should mention that the one known sample that illustrates this characteristic is, as they say, “NSFW”.