{"id":4551,"date":"2019-01-15T08:56:17","date_gmt":"2019-01-15T16:56:17","guid":{"rendered":"https:\/\/multimedia.cx\/eggs\/?p=4551"},"modified":"2019-01-15T14:06:26","modified_gmt":"2019-01-15T22:06:26","slug":"re-clue-chronicles-compression","status":"publish","type":"post","link":"https:\/\/multimedia.cx\/eggs\/re-clue-chronicles-compression\/","title":{"rendered":"Reverse Engineering Clue Chronicles Compression"},"content":{"rendered":"<p><a href=\"https:\/\/multimedia.cx\/eggs\/parsing-the-clue-chronicles\/\">My last post<\/a> described my exploration into the 1999 computer game <a href=\"https:\/\/www.mobygames.com\/game\/windows\/clue-chronicles-fatal-illusion\">Clue Chronicles: Fatal Illusion<\/a>. Some readers expressed interest in the details so I thought I would post a bit more about how I have investigated and what I have learned.<\/p>\n<p>It&#8217;s frustrating to need to reverse engineer a compression algorithm that is only applied to a total of 8 files (out of a total set of ~140), but here we are. Still, I&#8217;m glad some others expressed interest in this challenge as it motivated me to author this post, which in turn prompted me to test and challenge some of my assumptions.<\/p>\n<blockquote><p>\n<strong>Spoiler:<\/strong> Commenter &#8216;m&#8217; gave me the clue I needed: PKWare Data Compression Library used the implode algorithm rather than deflate. I was able to run this .ini data through an open source explode algorithm found in <a href=\"https:\/\/github.com\/ge0rg\/libmpq\">libmpq<\/a> and got the correct data out.\n<\/p><\/blockquote>\n<p><strong>Files To Study<\/strong><br \/>\n<a href=\"https:\/\/multimedia.cx\/clue-chronicles-files\/\">I uploaded a selection of files for others to study, should they feel so inclined<\/a>. These include the main game binary (if anyone has ideas about how to isolate the decompression algorithm from the deadlisting); compressed and uncompressed examples from 2 files (newspaper.ini and Drink.ini); and the compressed version of Clue.ini, which I suspect is the root of the game&#8217;s script.<\/p>\n<p><strong>The Story So Far<\/strong><br \/>\nThis ad-hoc scripting language found in the Clue Chronicles game is driven by a series of .ini files that are available in both compressed and uncompressed forms, save for a handful of them which only come in compressed flavor. I have figured out a few obvious details of the compressed file format:<\/p>\n<pre>\r\nbytes 0-3 \"COMP\"\r\nbytes 4-11 unknown\r\nbytes 12-15 size of uncompressed data\r\nbytes 16-19 size of compressed data (filesize - 20 bytes)\r\nbytes 20- compressed payload\r\n<\/pre>\n<p>The average compression ratio is on the same order as what could be achieved by running &#8216;gzip&#8217; against the uncompressed files and using one of the lower number settings (i.e., favor speed vs. compression size, e.g., &#8216;gzip -2&#8217; or &#8216;gzip -3&#8217;). Since <a href=\"https:\/\/en.wikipedia.org\/wiki\/DEFLATE\">the zlib\/DEFLATE algorithm<\/a> is quite widespread on every known computing platform, I thought that this would be a good candidate to test. <\/p>\n<p><strong>Exploration<\/strong><br \/>\nMy thinking was that I could load the bytes in the compressed ini file and feed it into <a href=\"https:\/\/docs.python.org\/2\/library\/zlib.html\">Python&#8217;s zlib library<\/a>, sliding through the first 100 bytes to see if any of them &#8220;catch&#8221; on the zlib decompression algorithm.<\/p>\n<p>Here is the exploration script:<br \/>\n<!--more--><\/p>\n<p><script src=\"https:\/\/gist.github.com\/multimediamike\/c95f1a9cc58b959f4d8b2a299927d35e.js\"><\/script><br \/>\n<noscript><br \/>\nSource code here: https:\/\/gist.github.com\/multimediamike\/c95f1a9cc58b959f4d8b2a299927d35e<br \/>\n<\/noscript><\/p>\n<p>It didn&#8217;t work, i.e., the script did not find any valid zlib data. A commentor on my last post suggested trying bzip2, so I tried the same script but with <a href=\"https:\/\/docs.python.org\/2\/library\/bz2.html\">the bzip2 decompressor library<\/a>. Still no luck.<\/p>\n<p><strong>Wrong Approach<\/strong><br \/>\nI realized I had not tested to make sure that this exploratory script would work on known zlib data. So I ran it on a .gz file and it failed to find zlib data. So it looks like my assumptions were wrong. Meanwhile, I can instruct Python to compress data with zlib and dump the data to a file, and then run the script against that raw zlib output and the script recognizes the data.<\/p>\n<p>I spent some time examining how zlib and gzip interact at the format level. It looks like the zlib data doesn&#8217;t actually begin on byte boundaries within a gzip container. So this approach was doomed to failure.<\/p>\n<p><strong>A Closer Look At The Executable<\/strong><br \/>\nInstallation of Clue Chronicles results in a main Windows executable named Fatal_Illusion.exe. It occurred to me to examine this again, specifically for references to something like zlib.dll. Nothing like that. However, a search for &#8216;compr&#8217; shows various error messages which imply that there is PNG-related code inside (referencing IHDR and zTXt data types), even though PNG files are not present in the game&#8217;s asset mix.<\/p>\n<p>But there are also strings like &#8220;PKWARE Data Compression Library for Win32&#8221;. So I have started going down the rabbit hole of determining whether the compression is part of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Zip_(file_format)\">a ZIP format file<\/a>. After all, a ZIP local file header data structure has 4-byte compressed and uncompressed sizes, as seen in this format.<\/p>\n<p><strong>Binary Reverse Engineering<\/strong><br \/>\nAt one point, I took the approach of attempting to reverse engineer the binary. When studying a deadlisting of the code, it&#8217;s easy to search for the string &#8220;COMP&#8221; and find some code that cares about these compressed files. Unfortunately, the code quickly follows an indirect jump instruction which makes it intractable to track the algorithm from a simple deadlisting.<\/p>\n<p>I also tried installing some old Microsoft dev tools on <a href=\"https:\/\/multimedia.cx\/eggs\/running-windows-xp-in-2016\/\">my old Windows XP box<\/a> and setting some breakpoints while the game was running and do some old-fashioned step debugging. That was a total non-starter. According to my notes:<\/p>\n<blockquote><p>\nAddress 0x004A3C32 is the setup to the strncmp(&#8220;COMP&#8221;, ini_data, 4) function call. Start there.<\/p>\n<p>Problem: The game forces 640x480x256 mode and that makes debugging very difficult.\n<\/p><\/blockquote>\n<p><strong>Just For One Game?<\/strong><br \/>\nI keep wondering if this engine was used for any other games. Clue Chronicles was created by <a href=\"https:\/\/www.mobygames.com\/company\/eai-interactive\">EAI Interactive<\/a>. As I review the list of games they are known to have created (ranging between 1997 and 2000), a few of them jump out at me as possibly being able to leverage the same engine. I have a few of them, so I checked those&#8230; nothing. Then I scrubbed some YouTube videos showing gameplay of other suspects. None of those strike me as having similar engine characteristics to Clue Chronicles. So this remains a mystery: did they really craft this engine with its own scripting language just for one game?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Trying to reverse engineer a compression format used for just 8 files in 1 old game&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[29],"tags":[],"class_list":["post-4551","post","type-post","status-publish","format-standard","hentry","category-game-hacking"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/4551","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/comments?post=4551"}],"version-history":[{"count":5,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/4551\/revisions"}],"predecessor-version":[{"id":4558,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/4551\/revisions\/4558"}],"wp:attachment":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/media?parent=4551"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/categories?post=4551"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/tags?post=4551"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}