Metal Gear Solid: The Twin Snakes for the Nintendo GameCube is very heavy on the cutscenes. Most of them are animated in real-time but there are a bunch of clips — normally of a more photo-realistic nature — that the developers needed to compress using a conventional video codec. What did they decide to use for this task? On2 VP3 (forerunner of Theora) in a custom transport format. This is only the second game I have seen in the wild that uses pure On2 VP3 (first was a horse game). Reimar and I sorted out most of the details sometime ago. I sat down today and wrote a FFmpeg / Libav demuxer for the format, mostly to prove to myself that I still could.
Things went pretty smoothly. We suspected that there was an integer field that indicated the frame rate, but 18 fps is a bit strange. I kept fixating on a header field that read 0x41F00000
. Where have I seen that number before? Oh, of course — it’s the number 30.0 expressed as an IEEE 32-bit float. The 4XM format pulled the same trick.
Hexadecimal Easter Egg
I know I finished the game years ago but I really can’t recall any of the clips present in the samples directory. The file mgs1-60.vp3 contains a computer screen granting the player access and illustrates this with a hexdump. It looks something like this:
Funny, there are only 22 bytes on a line when there should be 32 according to the offsets. But, leave it to me to try to figure out what the file type is, regardless. I squinted and copied the first 22 bytes into a file:
1F 8B 08 00 85 E2 17 38 00 03 EC 3A 0D 78 54 D5 38 00 03 EC 3A 0D
And the answer to the big question:
$ file mgsfile mgsfile: gzip compressed data, from Unix, last modified: Wed Oct 27 22:43:33 1999
A gzip’d file from 1999. I don’t know why I find this stuff so interesting, but I do. I guess it’s no more and less strange than writing playback systems like this.
What a shame that the display cuts off the rightmost 10 bytes… otherwise, it would be possible to uncompress the file.
No doubt– it would be tedious but ultimately possible. I think we could forget about OCR’ing the text since the compression rendered it very difficult to read. So we would have to enter it all by hand.
Not like I would have had anything better to do. :-)
Nooo why did they cut off those bytes?! I guess it’s too difficult to decompress whatever can be decompressed with what we see :-(
Stupid question, but did anyone search for those bytes in the game’s data files?
Also those 22 bytes might be enough to decompress something with some recovery program and check the file type of that (I guess there’s a good chance it might be a tar file).
22 bytes arent enough :(
The offsets in your screenshot are:
000
020
040
060
100
120
140
160
200
[…]
This is not hex (what happened to 80, A0, C0, E0). They look like octal, but that doesn’t make sense with 22 bytes on a line, should be 16 in that case. Maybe the artist (or programmer) just whipped up something quick and dirty that incremented by 20 (octal) for each printed line and filled the rest of the line with bytes (random? actual file dump?)?
@saintdev: You’re absolutely right! I hadn’t caught that.
@Reimar; Good question; I hadn’t thought to do that. However, I just searched for the bytes throughout their entire ISO dumps of both game discs and came up empty.
Oh, heh. Just noticed the pattern. And the offsets _are_ octal. The first 16 bytes are unique, then the final 6 bytes are just bytes 7-12 (0 indexed). Hopefully this comes through.
0000000: 1F 8B 08 00 85 E2 17 38 00 03 EC 3A 0D 78 54 D5
38 00 03 EC 3A 0D
0000020: 95 77 32 13 18 CD 84 19 24 45 40 84 81 90 FC 9A
19 24 45 40 84 81
It might be possible to reconstruct a meaningful file….
ASCII art failed. darn :(
Still shows up correctly if you view the page source, however ;)
@saintdev: Yep, came through clearly! Nice find.
That said, reconstructing the file would not be easy. It goes on for a long time. Actually, I just played the file again and noticed that the offsets start back at 0 a few times and the files are different lengths.
Again, I think we can rule out OCR because of the high compression artifacts.
Come on, writing that down has to be nothing compared to copying down BASIC loaders for C64 ASM code. I know I wrote down many pages of numbers back then.
Admittedly being a child with too much free time helps immensely.
Concerning the OCR approach: if the scroll speed is constant it might be possible to average out the artefacts somewhat over multiple frames.
Next step would be an OCR program that instead of grammar/spelling rules understands gzip :-)
I have results for the visible page:
$ xxd new.dat
0000000: 1f8b 0800 85e2 1738 0003 ec3a 0d78 54d5 …….8…:.xT.
0000010: 9577 3213 18cd 8419 2445 4084 8180 fc9a .w2…..$E@…..
0000020: 0c49 20ba 829d 50de aabb 6679 55ac 5ad1 .I …P…fyU.Z.
0000030: 8afc f888 8218 a0b2 2db1 a433 8384 6128 ……..-..3..a(
0000040: b8b4 1bb6 7c42 9676 eb56 6b69 d7d5 e82a ….|B.v.Vki…*
0000050: 06b5 066d ad7c 2ebb b2bb a94e 6d76 7daf …m.|…..Nmv}.
0000060: 9dba a3c4 38e8 c0ec f9b9 f7cd 9b49 8275 ….8……..I.u
0000070: bfea f7ed b74e 32f3 eecf b9e7 9c7b ceb9 …..N2……{..
0000080: e79e 7bee 5b73 fbfa d5d5 e293 fd04 eb42 ..{.[s………B
0000090: f5f5 7383 2258 33af b6a6 169f b575 3521 ..s.”X3……u5!
00000a0: 78da 9f50 3038 af7e dedc 79f5 a1fa dab9 x..P08.~..y…..
00000b0: c1e0 9cd0 dc50 ad08 cefd 84f9 a2cf c6f5 …..P……….
00000c0: 1b96 3507 83f0 6c0e cda9 ab1f 92e6 c6f5 ..5…l………
00000d0: 2b9b 3f0d 7e3e e5cf 1ad4 ff17 565c dfbc +.?.~>……V\..
00000e0: 7ac3 ca4f ca0e 3e96 feeb a03c a726 5417 z..O..>………..
0000110: 35b7 6d5c 7de7 8aaa 959b 56fe d168 cc09 5.m\}…..V..h..
0000120: 85e6 cdab 1b42 ff35 a179 a110 ea7f 6e7d …..B.5.y….n}
0000130: 5d2d ]-
$ zcat new.dat > tmp
gzip: ../new.dat: unexpected end of file
$ file tmp
tmp: POSIX tar archive (GNU)
$ strings tmp
mgsi/
40775
26323
23420
0 6765670735 10503
ustar
usr01475
user
mgsi/CdWrite/
40775
26323
23420
0 6765670740 12040
ustar
usr01475
user
mgsi/CdWrite/bin/
40775
26323
23420
0 6765670741 12611
ustar
usr01475
user
mgsi/CdWrite/bin/mbuild.exe
100664
26323
23420
206000 65743
Hm, does distributing a video of a hexdump of a tar file count as copyright violation of the tar file? :-)
I couldn’t quite figure out what kind of program that is, a bin path inside a tar file containing a .exe? Very strange.
One could approximate the length of the gzip file by measuring how fast the hex dump scrolls.
According to gnafu (and the interwebz) mbuild.exe is a compiler (or tool chain? or ide?) from Green Hills Software.
i’m glad i’m not the only one who pays attention to whats on the screen in the screen in movies, video games and tv. :)
fascinating stuffs.