First Love: Vector Quantization | Breaking Eggs And Making Omelettes

Someone was asking me about vector quantizer codecs recently. Sure, Wikipedia has the obligatory article. To its credit, the article is actually halfway useful these days (I seem to recall that it used to be a lot more impenetrable). It doesn’t help that the concept is identified by 2 terms that, by themselves, sound somewhat intimidating: ‘vector’ and ‘quantization’.

Anyway, he asked the right person about VQ codecs because I happen to love VQ codecs and can go on for days about them. In fact, I might do just that. I’ll start with a post about the theory and then describe specific examples in separate posts.

Aside: When I first started studying and reverse engineering VQ codecs, I was confused by the word “vector”. For some reason when I think of vectors, I think of 3D graphics, which I do not understand very well. At first, I thought VQ had something to do with 3D math. If you are having similar conceptual comprehension problems, allow me to clarify:

Mathematically, a vector is nothing but an ordered set of numbers; a tuple. Another math nerd definition is that vectors are one dimensional arrays of orthogonal variables. In the context of 3D math, a vector contains (x, y, z) coordinates. In the context of video codecs, a vector represents a group of pixel values. E.g., take a 4×2 pixel vector from an image, grouped by [square brackets]:
x x [a b c d] x x
x x [e f g h] x x
For coding purposes, this vector is [a b c d e f g h].

I was asked to sum up VQ as succinctly as I could. As a multimedia hacker, I would say that vector quantization is about picture tiling: The picture is composed of a series of indices into a table of pixel blocks. Granted, this is a video codec-centric view of VQ; VQ also has application in such audio codecs as Vorbis and TwinVQ.

When decoding an image encoded with a VQ codec, the bitstream is generally comprised of a bunch of indices into a table. This table is called the codebook (sounds fancy and technical). Reconstruct the image by pulling these blocks (or vectors) out of the codebook table and tiling the screen. But where does the codebook come from? And how does the encoder match the original image’s pixels to an entry in the codebook? Those are the 2 major recurring problems in the field of vector quantization

To restate, the 2 big issues of encoding VQ:

How to generate an optimal codebook, either for a particular image, or for general image material.
How to search through the codebook in order to match blocks from an uncompressed image to the best possible vector from the codebook.

Naturally, different codecs make their mark by using different methods for solving these problems. However, no matter what the method, VQ encoding is infamously slow. In fact, a third major problem in the VQ field could be stated as:

How to encode a video in a reasonable amount of time.

If it’s so incredibly slow to encode, why do people care about VQ codecs at all? Ironically, the answer is speed. Relatively speaking, VQ codecs are blazingly fast to decode. As explained above, decoding VQ is often just a matter of copying a sequence of pixels out of a table at a particular index. VQ codecs were popular in the early days of computer multimedia when common PCs were not yet fast enough for codecs that relied of methods such as transform coding (most famously, the discrete cosine transform).

These days, since computers are so much faster than when optical storage was first widely available, VQ codecs have largely fallen out of favor compared to transform codecs which are far more efficient to encode.

Examples Of VQ Codecs:

There are likely more that I’m neglecting but those are the ones that pop out at me from the master video codecs list over on the MultimediaWiki. I would be remiss if I didn’t mention another category of VQ video codecs at this juncture:

Vector Quantizers Without Codebooks
The description of VQ codecs above assumes that the coding scheme requires a codebook. This is not always the case. There are VQ codecs such as Microsoft’s Video-1, Apple’s SMC and RPZA, and IBM’s UltiMotion which encode video without a codebook, but rather have a series of stream encodings for specifying blocks of pixels without actually encoding each pixel (though that is an option for pathological cases).

Examples Of Audio VQ Codecs:
These audio codecs are known to incorporate vector quantization somewhere in their coding process:

Case Studies:
I will be posting more case studies covering various VQ codecs, how they operate, and the trade-offs they take into account. However, here is one VQ codec I have already written up on this blog: “The World’s Simplest Vector Quantizer” (Creature Shock AVS).

Further Reading:

9 thoughts on “First Love: Vector Quantization”

Robert Swain April 23, 2007 at 10:14 am

“have a series of stream encodings for specifying blocks of pixels without actually encoding each pixel” – do you care to translate that into English please? Or at least elaborate a bit.
Multimedia Mike Post authorApril 23, 2007 at 10:31 am

@Robert: Absolutely; I’m preparing a post on that VQ category and will post it sometime in the next few days. That segment of the post was a bit of an afterthought and not fleshed out well, as you noticed.
Kostya April 23, 2007 at 10:39 am

First, how could you forget Smacker?!

Second, IIRC Dr. Tim Fergusson has MS Video-1 16-bit encoder with quite simple vector quantizer.
Multimedia Mike Post authorApril 23, 2007 at 11:08 am

I honestly couldn’t remember if Smacker was a VQ codec. All I could remember was the Huffman part. But I just remembered the assertion that it’s essentially Huffman-codec MS Video-1. Added to the list.
Benjamin Larsson April 23, 2007 at 11:59 pm

Most (a)celp speech codes use VQ also, (Speex, acelp.net, qcelp).
Robert Swain April 24, 2007 at 2:27 am

When looking at AMR the first time round, I did wonder why they had a table of codes and encoded the indices to them. I concluded that you could have few enough codes that cover a short enough period that you can get a good enough match to make the voice signal sound reasonable. I guess I figured out what was supposed to be happening without really knowing it.

It seems like an incredibly simple approach, but one that has made AMR make sense to me after Mike’s discussion. I’m looking forward to getting back to it. :)
VAG April 25, 2007 at 1:36 am

I can hardly call Smacker a pure VQ-based codec, as it’s more relies on BTC and Huffman coding, than actual vectors (2×1 vectors, hmm?). For another (well) known VQ codec I can add EA’s TGV video.
Eduardo Morras November 19, 2007 at 11:44 pm

What’s the difference between VQ codecs and Bidimensional LZ78 model with not exact matching? That’s a lossy LZ78 model where we point not to the exact match but to more similar entry, updating it.

Thanks
Multimedia Mike Post authorNovember 19, 2007 at 11:56 pm

An LZ variant can be lossy? Really? I hadn’t heard of that. I looked up LZ78 and apparently it has something in common with LZW (or perhaps is LZW), the variant used in GIF. I suppose you could say that it uses vectors (variable length strings in the dictionary). But I don’t think you could make the case that those vectors are quantized.

Comments are closed.