Custom NES Video Codec

The 8-bit Nintendo Entertainment System (NES) is my favorite video game console of all time. I even used to maintain a native Linux NES emulator named TuxNES to help preserve the nostalgia.

Castlevania Screenshot <– Simon vs. the undead fish monsters in Konami’s original Castlevania

Link, the hope of Hyrule, takes on the Octoroks in the original Legend of Zelda –>
Legend of Zelda Screenshot

But I love these: Videos showcasing tool-assisted “Time Attacks”. The basic idea is that the people behind the videos use a variety of feature-rich console emulators to get through games very quickly using some very clever methods. Watching a video from start to finish usually lets you relive the experience of playing through the entire game, all in 10-20 minutes, nominally.

The video is encoded with the DivX codec; the frames are each 256×224 at 60 frames/sec (for NTSC games; 50 fps for PAL games). Audio is generally encoded as MP3 audio at 32-64 kbps. As a multimedia freak, I have to admit that it is a little frustrating to watch these videos which are typically encoded with ISO MPEG-4 (under the fourcc DIVX). So I started to wonder if it would be possible to develop a custom codec specifically for coding this type of video, and losslessly.

The NES is capable of a total palette of 64 colors. The 6-bit color values correspond to a luminance/chrominance combination that is tied to certain properties of a standard television.

The NES generates video frames by combining data in a series of tables that reside in its custom picture processing unit (PPU). There are tile tables containing 8×8-pixel tiles. Then there is the name table which specifies how these tiles are laid out to form a background image. Then there is the sprite table that specifies the freeform (x,y) coordinates of certain tiles from the tile table (note that these tiles can be configured to map either 8×8 or 8×16 tiles).

There are 2 16-entry palettes in use during video rendering: one palette corresponds to background tiling data and the other palette corresponds to sprite data. Theoretically, there can be at most 32 colors on screen at one time. Certain NES demo-makers have come up with ways around that by playing with certain hardware registers at opportune moments during the scanline-rendering cycle. Working under the 32-color assumption, however:

(256 * 224) pixels/frame * 5 bits/pixel * 1/8 bytes/bits = 35,840 bytes/frame

If, hypothetically, a format were to store raw, uncompressed frames, it could do it with about 35 kilobytes per frame.

Since the picture is composed strictly of 8×8 titles coming from a lookup table, this would seem to lend itself to a vector quantization coding scheme. Ideally, an emulator could be hacked so that after running enough CPU instructions to account for an entire frame refresh, it exports the contents of the tile table as well as the background tiling map and sprite indices with their coordinates. Come up with some clever coding scheme for this data, transport it in your favorite multimedia container format, and when it is time to play the file, tile the background and the sprites according to the encoded data.

In practice, of course, it is not that simple. Actually, it might work for perhaps 85% of the games out there. However, it would quickly fall over on the coolest 15% of games. Video game console programming has traditionally been about tweaking and stretching the standard hardware set to get every ounce of performance and every niftiest graphical effect you can possibly get. After all, you can not ask the user to simply upgrade the video hardware. Even if this special video codec could have access to that table data after a frame was finished rendering, or before the frame started rendering, it would not be very useful. Lots of special graphical tricks occur by manipulating the tables or the graphics registers in the middle of rendering, e.g., during the horizontal video refresh. The upshot is that the PPU state might be very different at the end of a frame than it was at the start of a frame.

Still, it is not unreasonable that some VQ principles could be applied here, at least for an initial frame of a movie file. I may revisit this idea later. In the meantime, if you are interested in NES hardware, here are some references: