Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


8088 Corruption Data Format

January 24th, 2006 by Multimedia Mike

By now, Trixter is gaining more fame due to his 8088 Corruption video thanks to a Digg mention. Since Alex more or less dared me to create Unix playback software for this format I downloaded the package and examined the 8088_COR.DAT file. It’s pretty straightforward: There are 0x7D0 bytes before the start of what looks like unsigned, 8-bit PCM data (you can tell because it is all 0x80 values, silence on a scale of 0x00..0xFF). There are 0x2DF PCM bytes before the non-PCM bytes start up again.

Let’s look at these numbers: 0x2DF = 735. I recognize this thanks to the Wing Commander III MVE format. 735 happens to be the quotient of 22050/30. The 8088 Corruption page mentions 30 frames/second in the first place.

What about the 0x7D0 number? That’s an even 2000 in decimal. 2000 = 80 x 25, the size of the standard text mode, in cells, of the old IBM 8088 machines. However, the first frame is entirely a sequence of 0xDB 0x10 byte pairs. My guess is that the video is actually using 40×25 text mode which makes for 1000 cells. Each cell is represented by an attribute byte (defines foreground color, background color, and flashing) and the actual extended ASCII character (0..255).

This strikes me as a classic vector quantizer problem– all the brainpower/horsepower goes into the compression side; the decompression is trivial. I can understand why Trixter claims the compressor is so complex.

Anyway, you want to write a program to interpret the data file? If my guesses are correct, the video will be rendered as 40×25 8×8-pixel cells, or 320×200 pixels. Find a table of all of the ASCII characters. Find another table of all the attributes. Load 2000 bytes from the data file. Render the data based on the attribute and character tables. Load the next 735 bytes. Play them back as mono, 8-bit, unsigned PCM data. Repeat until EOF.

As a means of validation, the size of a single frame should divide evenly into the total size of the Corruption data file which is 9268915 bytes. One frame is 2000 + 735 = 2735 bytes. 9268915 bytes / 2735 bytes/frame = 3389 frames. 3389 frames / 30 frames/sec = ~113 sec.

Hey, Jim: Do I have that all correct? :)

More on this topic:

Posted in Reverse Engineering | 5 Comments »

5 Responses

  1. Jim Leonard Says:

    Sweet Lincoln’s Mullet!

    Yes, that is all 100% correct :-) And yes, the compressor takes several seconds per frame on my Athlon 2500+. My original idea was a 1-to-1 match, but gODjR (the other person mentioned in the docs) had the idea to do subsampling and much better dithering, so the quality props go to him.

    The IBM BIOS ROM font is essentially the codebook. It is important to use the actual IBM ROM font as the conversions were done specifically with that font (this is what sets it apart from aalib or libcaca, which don’t take actual font rendering into account.

    Something for the wiki, yes?

  2. Multimedia Mike Says:

    “Something for the wiki, yes?”

    Be my guest. BTW, where can I find the IBM BIOS ROM codebook? Do you rip them from the ROM in software? You must have a software copy somewhere for the encoder?

  3. Jim Leonard Says:

    Yes, email me if you want the ROM dump from my 5150.

  4. M-ko Says:

    My linked page is actually my continuation of the reverse engineering to its logical conclusion: a crude player and a crude decoder. The player is slow and unable to hold synch; the decoder outputs to an unwieldy format of bmp-per-frame+separate wav audio. The player needs SDL and both need SoX in order to play or mux the audio. I have successfully made an AVI out of the output of the decoder, using Virtual Dub under WINE. It came out great.

  5. Robert Greenstreet Says:

    You guys are awesome! It’s almost like history is being rewritten. A sort of sci-fi revisionists version of the original PC! Thank you!!