Category Archives: Reverse Engineering

Brainstorming and case studies relating to craft of software reverse engineering.

NES Compression

For budding computer geeks like myself, volume 20/January 1991 issue of Nintendo Power magazine (Mega Man III cover) was of particular interest. In this issue, the editors attempted to shed some light on the technical matters driving the 8-bit Nintendo Entertainment System and give insufferable know-it-alls further opportunity to truly know it all.

I still recall the puerile, simplistic explanation for compression using a metaphor every Nintendo nerd could relate to: Tetris! I still think about this article sometimes and wonder what compression method they used and for which games. In fact, I located the relevant article section and scanned it for your enjoyment/edification:

Click for larger image

I came across this document describing a compression algorithm used for a Zelda game on the Super NES, the direct successor to the NES, both market-wise and on a technical level. It seems that the scheme is largely a flexible RLE scheme which allows for decoding single- or double-byte runs (output 5 instances of byte 0x56 vs. output 5 instances of 0x56 0x42); decoding runs of an incrementing byte (e.g., output a sequence of 4 incrementing bytes starting with 0xA2: 0xA2 0xA3 0xA4 0xA5); and even using into the encoded buffer as a dictionary and copying runs from a specified index.

8088 Corruption Data Format

By now, Trixter is gaining more fame due to his 8088 Corruption video thanks to a Digg mention. Since Alex more or less dared me to create Unix playback software for this format I downloaded the package and examined the 8088_COR.DAT file. It’s pretty straightforward: There are 0x7D0 bytes before the start of what looks like unsigned, 8-bit PCM data (you can tell because it is all 0x80 values, silence on a scale of 0x00..0xFF). There are 0x2DF PCM bytes before the non-PCM bytes start up again.

Let’s look at these numbers: 0x2DF = 735. I recognize this thanks to the Wing Commander III MVE format. 735 happens to be the quotient of 22050/30. The 8088 Corruption page mentions 30 frames/second in the first place.

What about the 0x7D0 number? That’s an even 2000 in decimal. 2000 = 80 x 25, the size of the standard text mode, in cells, of the old IBM 8088 machines. However, the first frame is entirely a sequence of 0xDB 0x10 byte pairs. My guess is that the video is actually using 40×25 text mode which makes for 1000 cells. Each cell is represented by an attribute byte (defines foreground color, background color, and flashing) and the actual extended ASCII character (0..255).

This strikes me as a classic vector quantizer problem– all the brainpower/horsepower goes into the compression side; the decompression is trivial. I can understand why Trixter claims the compressor is so complex.

Anyway, you want to write a program to interpret the data file? If my guesses are correct, the video will be rendered as 40×25 8×8-pixel cells, or 320×200 pixels. Find a table of all of the ASCII characters. Find another table of all the attributes. Load 2000 bytes from the data file. Render the data based on the attribute and character tables. Load the next 735 bytes. Play them back as mono, 8-bit, unsigned PCM data. Repeat until EOF.

As a means of validation, the size of a single frame should divide evenly into the total size of the Corruption data file which is 9268915 bytes. One frame is 2000 + 735 = 2735 bytes. 9268915 bytes / 2735 bytes/frame = 3389 frames. 3389 frames / 30 frames/sec = ~113 sec.

Hey, Jim: Do I have that all correct? :)

More on this topic:

SDL Corruption (early SDL-based player for data format)
SDL Corruption Corrected (corrected version of the same SDL player)

Profiles In Multimedia Hacking

Kudos to Reimar Döffinger for a new, clean implementation of the LZO lossless coding algorithm for the FFmpeg project. This means that there is one less external package that multimedia players have to rely on since there is an equivalent capability in FFmpeg.

I wanted to highlight this because it is useful when someone undertakes to re-implement a decoder from another package as another FFmpeg module. This has happened a number of times already, in the cases of Vorbis, FLAC, and VP3, for example. Remember, reverse engineering also applies to understanding bodies of code written in higher level languages than ASM and re-implementing them based on your newfound understanding.

For eye candy, check out Reimar’s “show your work” handwritten notes for understanding the LZO decoder. The original LZO decoder seemed to have so many labels and goto statements that it might as well have been written in straight ASM.

Smacker Reverse Engineered

One of my contacts notified me that he has reverse engineered the Smacker multimedia system. Hopefully we will get a description published soon and put a fresh open source implementation out in the wild.

This is great. This is one of the oldest and most enduring multimedia systems in the field, and one of the most widespread– The official website boasts of its use in over 2,600 software titles.

Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering

Category Archives: Reverse Engineering

NES Compression

8088 Corruption Data Format

Profiles In Multimedia Hacking

Smacker Reverse Engineered