Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


Archives:

Reverse Engineering Clue Chronicles Compression

January 15th, 2019 by Multimedia Mike

My last post described my exploration into the 1999 computer game Clue Chronicles: Fatal Illusion. Some readers expressed interest in the details so I thought I would post a bit more about how I have investigated and what I have learned.

It’s frustrating to need to reverse engineer a compression algorithm that is only applied to a total of 8 files (out of a total set of ~140), but here we are. Still, I’m glad some others expressed interest in this challenge as it motivated me to author this post, which in turn prompted me to test and challenge some of my assumptions.

Spoiler: Commenter ‘m’ gave me the clue I needed: PKWare Data Compression Library used the implode algorithm rather than deflate. I was able to run this .ini data through an open source explode algorithm found in libmpq and got the correct data out.

Files To Study
I uploaded a selection of files for others to study, should they feel so inclined. These include the main game binary (if anyone has ideas about how to isolate the decompression algorithm from the deadlisting); compressed and uncompressed examples from 2 files (newspaper.ini and Drink.ini); and the compressed version of Clue.ini, which I suspect is the root of the game’s script.

The Story So Far
This ad-hoc scripting language found in the Clue Chronicles game is driven by a series of .ini files that are available in both compressed and uncompressed forms, save for a handful of them which only come in compressed flavor. I have figured out a few obvious details of the compressed file format:

bytes 0-3 "COMP"
bytes 4-11 unknown
bytes 12-15 size of uncompressed data
bytes 16-19 size of compressed data (filesize - 20 bytes)
bytes 20- compressed payload

The average compression ratio is on the same order as what could be achieved by running ‘gzip’ against the uncompressed files and using one of the lower number settings (i.e., favor speed vs. compression size, e.g., ‘gzip -2’ or ‘gzip -3’). Since the zlib/DEFLATE algorithm is quite widespread on every known computing platform, I thought that this would be a good candidate to test.

Exploration
My thinking was that I could load the bytes in the compressed ini file and feed it into Python’s zlib library, sliding through the first 100 bytes to see if any of them “catch” on the zlib decompression algorithm.

Here is the exploration script:
Read the rest of this entry »

Posted in Game Hacking | 3 Comments »

Parsing The Clue Chronicles

December 29th, 2018 by Multimedia Mike

A long time ago, I procured a 1999 game called Clue Chronicles: Fatal Illusion, based on the classic board game Clue, a.k.a. Cluedo. At the time, I was big into collecting old, unloved PC games so that I could research obscure multimedia formats.



Surveying the 3 CD-ROMs contained in the box packaging revealed only Smacker (SMK) videos for full motion video which was nothing new to me or the multimedia hacking community at the time. Studying the mix of data formats present on the discs, I found a selection of straightforward formats such as WAV for audio and BMP for still images. I generally find myself more fascinated by how computer games are constructed rather than by playing them, and this mix of files has always triggered a strong “I could implement a new engine for this!” feeling in me, perhaps as part of the ScummVM project which already provides the core infrastructure for reimplementing engines for 2D adventure games.

Tying all of the assets together is a custom high-level programming language. I have touched on this before in a blog post over a decade ago. The scripts are in a series of files bearing the extension .ini (usually reserved for configuration scripts, but we’ll let that slide). A representative sample of such a script can be found here:

clue-chronicles-scarlet-1.txt

What Is This Language?
At the time I first analyzed this language, I was still primarily a C/C++-minded programmer, with a decent amount of Perl experience as a high level language, and had just started to explore Python. I assessed this language to be “mildly object oriented with C++-type comments (‘//’) and reliant upon a number of implicit library functions”. Other people saw other properties. When I look at it nowadays, it reminds me a bit more of JavaScript than C++. I think it’s sort of a Rorschach test for programming languages.

Strangely, I sort of had this fear that I would put a lot of effort into figuring out how to parse out the language only for someone to come along and point out that it’s a well-known yet academic language that already has a great deal of supporting code and libraries available as open source. Google for “spanish dolphins far side comic” for an illustration of the feeling this would leave me with.

It doesn’t matter in the end. Even if such libraries exist, how easy would they be to integrate into something like ScummVM? Time to focus on a workable approach to understanding and processing the format.

Problem Scope
So I set about to see if I can write a program to parse the language seen in these INI files. Some questions:

  1. How large is the corpus of data that I need to be sure to support?
  2. What parsing approach should I take?
  3. What is the exact language format?
  4. Other hidden challenges?

To figure out how large the data corpus is, I counted all of the INI files on all of the discs. There are 138 unique INI files between the 3 discs. However, there are 146 unique INI files after installation. This leads to a hidden challenge described a bit later.

What parsing approach should I take? I worried a bit too much that I might not be doing this the “right” way. I’m trying to ignore doubts like this, like how “SQL Shame” blocked me on a task for a little while a few years ago as I concerned myself that I might not be using the purest, most elegant approach to the problem. I know I covered language parsing a lot time ago in university computer science education and there is a lot of academic literature to the matter. But sometimes, you just have to charge in and experiment and prototype and see what falls out. In doing so, I expect to have a better understanding of the problems that need to solved and the right questions to ask, not unlike that time that I wrote a continuous integration system from scratch because I didn’t actually know that “continuous integration” was the keyword I needed.

Next, what is the exact language format? I realized that parsing the language isn’t the first and foremost problem here– I need to know exactly what the language is. I need to know what the grammar are keywords are. In essence, I need to reverse engineer the language before I write a proper parser for it. I guess that fits in nicely with the historical aim of this blog (reverse engineering).

Now, about the hidden challenges– I mentioned that there are 8 more INI files after the game installs itself. Okay, so what’s the big deal? For some reason, all of the INI files are in plaintext on the CD-ROM but get compressed (apparently, according to file size ratios) when installed to the hard drive. This includes those 8 extra INI files. I thought to look inside the CAB installation archive file on the CD-ROM and the files were there… but all in compressed form. I suspect that one of the files forms the “root” of the program and is the launching point for the game.

Parsing Approach
I took a stab at parsing an INI file. My approach was to first perform lexical analysis on the file and create a list of 4 types: symbols, numbers, strings, and language elements ([]{}()=.,:). Apparently, this is the kind of thing that Lex/Flex are good at. This prototyping tool is written in Python, but when I port this to ScummVM, it might be useful to call upon the services of Lex/Flex, or another lexical analyzer, for there are many. I have a feeling it will be easier to use better tools when I understand the full structure of the language based on the data available.
Read the rest of this entry »

Posted in Game Hacking | 3 Comments »

Dreamcast Serial Extractor

December 31st, 2017 by Multimedia Mike

It has not been a very productive year for blogging. But I started the year by describing an unfinished project that I developed for the Sega Dreamcast, so I may as well end the year the same way. The previous project was a media player. That initiative actually met with some amount of success and could have developed into something interesting if I had kept at it.

By contrast, this post describes an effort that was ultimately a fool’s errand that I spent way too much time trying to make work.

Problem Statement
In my neverending quest to analyze the structure of video games while also hoarding a massive collection of them (though I’m proud to report that I did play at least a few of them this past year), I wanted to be able to extract the data from my many Dreamcast titles, both games and demo discs. I had a tool called the DC Coder’s Cable, a serial cable that enables communication between a Dreamcast and a PC. With the right software, you could dump an entire Dreamcast GD-ROM, which contained a gigabyte worth of sectors.

Problem: The dumping software (named ‘dreamrip’ and written by noted game hacker BERO) operated in a very basic mode, methodically dumping sector after sector and sending it down the serial cable. This meant that it took about 28 hours to extract all the data on a single disc by running at the maximum speed of 115,200 bits/second, or about 11 kilobytes/second. I wanted to create a faster method.

The Pitch
I formed a mental model of dreamrip’s operation that looked like this:



As an improvement, I envisioned this beautiful architecture:
Read the rest of this entry »

Posted in Sega Dreamcast | 2 Comments »

Writing A Dreamcast Media Player

January 5th, 2017 by Multimedia Mike

I know I’m not the only person to have the idea to port a media player to the Sega Dreamcast video game console. But I did make significant progress on an implementation. I’m a little surprised to realize that I haven’t written anything about it on this blog yet, given my propensity for publishing my programming misadventures.


3 Dreamcast consoles in a row

This old effort had been on my mind lately due to its architectural similarities to something else I was recently brainstorming.

Early Days
Porting a multimedia player was one of the earliest endeavors that I embarked upon in the multimedia domain. It’s a bit fuzzy for me now, but I’m pretty sure that my first exposure to the MPlayer project in 2001 arose from looking for a multimedia player to port. I fed it through the Dreamcast development toolchain but encountered roadblocks pretty quickly. However, this got me looking at the MPlayer source code and made me wonder how I could contribute, which is how I finally broke into practical open source multimedia hacking after studying the concepts and technology for more than a year at that point.

Eventually, I jumped over to the xine project. After hacking on that for awhile, I remembered my DC media player efforts and endeavored to compile xine to the console. The first attempt was to simply compile the codebase using the Dreamcast hobbyist community’s toolchain. This is when I came to fear the multithreaded snake pit in xine’s core. Again, my memories are hazy on the specifics, but I remember the engine having a bunch of threading hacks with comments along the lines of “this code deadlocks sometimes, so on shutdown, monitor this lock and deliberately break it if it has been more than 3 seconds”.

Something Workable
Eventually, I settled on a combination of FFmpeg’s libavcodec library for audio and video decoders, xine’s demuxer library, and xine’s input API, combined with my own engine code to tie it all together along with video and output drivers provided by the KallistiOS hobbyist OS for Dreamcast. Here is a simple diagram of the data movement through this player:


Architecture diagram for a Sega Dreamcast media player

Details and Challenges
This is a rare occasion when I actually got to write the core of a media player engine. I made some mistakes.

xine’s internal clock ran at 90000 Hz. At least, its internal timestamps were all in reference to a 90 kHz clock. I got this brilliant idea to trigger timer interrupts at 6000 Hz to drive the engine. Whatever the timer facilities on the Dreamcast, I found that 6 kHz was the greatest common divisor with 90 kHz. This means that if I could have found an even higher GCD frequency, I would have used that instead.

So the idea was that, for a 30 fps video, the engine would know to render a frame on every 200th timer interrupt. I eventually realized that servicing 6000 timer interrupts every second would incur a ridiculous amount of overhead. After that, my engine’s philosophy was to set a timer to fire for the next frame while beginning to process the current frame. I.e., when rendering a frame, set a timer to call back in 1/30th of a second. That worked a lot better.

As I was still keen on 8-bit paletted image codecs at the time (especially since they were simple and small for bootstrapping this project), I got to use output palette images directly thanks to the Dreamcast’s paletted textures. So that was exciting. The engine didn’t need to convert the paletted images to a different colorspace before rendering. However, I seem to recall that the Dreamcast’s PowerVR graphics hardware required that 8-bit textures be twiddled/swizzled. Thus, it was still required to manipulate the 8-bit image before rendering.

I made good progress on this player concept. However, a huge blocker for me was that I didn’t know how to make a proper user interface for the media player. Obviously, programming the Dreamcast occurred at a very low level (at least with the approach I was using), so there were no UI widgets easily available.

This was circa 2003. I assumed there must have been some embedded UI widget libraries with amenable open source licenses that I could leverage. I remember searching and checking out a library named libSTK. I think STK stood for “set-top toolkit” and was positioned specifically for doing things like media player UIs on low-spec embedded computing devices. The domain hosting the project is no longer useful but this appears to be a backup of the core code.

It sounded promising, but the libSTK developers had a different definition of “low-spec embedded” device than I did. I seem to recall that they were targeting something along with likes of a Pentium III clocked at 800 MHz with 128 MB RAM. The Dreamcast, by contrast, has a 200 MHz SH-4 CPU and 16 MB RAM. LibSTK was also authored in C++ and leveraged the Boost library (my first exposure to that code), and this all had the effect of making binaries quite large while I was trying to keep the player in lean C.

Regrettably, I never made any serious progress on a proper user interface. I think that’s when the player effort ran out of steam.

The Code
So, that’s another project that I never got around to finishing or publishing. I was able to find the source code so I decided to toss it up on github, along with 2 old architecture outlines that I was able to dig up. It looks like I was starting small, just porting over a few of the demuxers and decoders that I knew well.

I’m wondering if it would still be as straightforward to separate out such components now, more than 13 years later?

Posted in Sega Dreamcast | Comments Off on Writing A Dreamcast Media Player

« Previous Entries