Custom Video Codec For 3D Hardware | Breaking Eggs And Making Omelettes

Exploiting capabilities/limitations of available video hardware is nothing new in terms of multimedia programming. The old IBM VGA hardware had a 320×200 resolution mode that could display 256 unique colors. For years, that drove many graphic-heavy applications (notably games but also certain video applications such as FLIC files originally generated by Autodesk software). Back when I was hacking on the Sega Dreamcast I started to brainstorm about a vector quantizer video codec that could take advantage of the PowerVR 3D graphics hardware present in the console.

This codec idea was inspired by the hierarchical multistage vector quantizer found in Sorenson Video 1. The SVQ1 encoder takes 16×16 blocks of samples, removes the block mean (average) from each sample and attempts to find a series of vectors from pre-defined codebooks that will match the vector residual as closely as possible. If the resultant encoding is not efficient enough, the codec subdivides the 16×16 block into 2 16×8 blocks, then 8×8, on down to 4×2. Decoding boils down to initializing a block with the removed mean and then adding a series of codebook to gain a rough approximation of the original block.

Let’s move over to the 3D hardware on the Dreamcast. The hardware takes a textured view of the world. To display an image, load an image into video RAM and tell the video hardware all about the image, resolution, color format, where to place it on the screen, scale it, rotate it, etc., and then tell the video hardware to display the image (which is now called a texture). Further, the video hardware can be told to blend textures which performs addition or subtraction of the texture’s RGB colors with underlying textures.

The codec idea revolves around a large number of vector codebooks sitting in video RAM as textures as well as a set of 256 base vectors which contain a single value in each element; these represent the mean vectors. The encoded video bitstream would contain instructions for which base vector + combination of codebook vectors to apply in order to reconstruct particular blocks of videos.

Outlandish? Perhaps. But I am convinced that similar hardware considerations were taken into account when designing the 4XM video codec. Interframe motion compensation uses block addition in an RGB565 colorspace and it seems reasonable that it could have done this using the DC video hardware.

One problem that tripped me up when considering this codec for the DC is that the video hardware requires that texture dimensions be powers of 2, 2ⁿ, where n = [3..10]. Thus, since 2³ = 8, the smallest possible vector is 8×8. SVQ1 subdivides vectors down to 1/8 that size. I am concerned that it may be difficult to find a good set of vector codebooks if 8×8 is the smallest vector size.

Then there is the issue of motion compensation. I do not think it is possible to ask the video hardware to cut out a piece of an existing video buffer, call it a texture, and apply it to a particular region of a new buffer.