Yearly Archives: 2006

Black Box Reverse Engineering

Reverse engineering an algorithm from binary code is tough enough. However, there is a larger issue of validation. One idea I have been thinking about for awhile is some method of hooking into an RE target during runtime and trapping data as it goes in and out of a particular function. The collected data would later be used as test vectors for the new implementation. However, it also occurred to me that this method could also be the RE tool itself. For example, if you are pretty sure that a particular piece of binary code operates as an inverse DCT, use the previously described method to observe data coming in and out. This can save you some trouble of tracing through a tedious stretch of code to determine that it actually is an IDCT. Plus, you can figure out if it is identical to, e.g., the standard MPEG/JPEG IDCT.

Colleague Benjamin Larsson noted that this would be referred to as black box reverse engineering.


Black box

Another basic application of this technique would be to monitor the bitstream parsing function for a given input bitstream. Many multimedia decoders delegate all of their bitstream parsing duties to a small number of functions and this would be a great way to validate that a new decoder is chopping up a bitstream in the correct manner.

How to accomplish this? I recently sat down and actually read through the entire GNU Debugger manual to see what interesting features I might have been missing all these years. I discovered tracepoints. These apparently let you gather data about a program without stopping the program periodically. Unfortunately, I don’t think the facility is flexible enough to do what I outline above.

Are there tools that can already do what is described here? Or will it take some custom tools? If it takes custom tools, I already have a head start with some of my experiments.

Evaluating Alternate Build Systems

Even though I am on record as expressing devotion to the Autotools suite, I am not averse to evaluating alternatives. Mostly, I’m interested in a competent build system that can take care of the difficult and tedious stuff pertaining to a build such as dependencies and configuration. I acknowledge that Autotools embody a fair amount of complexity and arcana. The two top contenders to plausibly compete for Autotools’ title appear to be SCons and CMake.


Components

A good baseline for evaluating the capabilities of an alternative is to find a limitation of your current solution and then figure out if the alternative can do that AND everything that the current solution can do. For example, on one of my software projects, I really appreciate that the current Autotools-based solution can:

  • automatically keep track of dependencies
  • manage multiple build targets
  • create multiple build configurations in separate directories, working from the same source tree

But now I need some very fine tweaking of certain build settings, such as being able to static link a particular version of libstdc++ to a binary. I don’t know if any of the common build systems support this without some very serious hacking.

Here is a blog post from someone who has struggled with the very same issues and was able to solve the problem with a hand-crafted Makefile: G Plus Plus Minus One. I have managed to achieve the correct results from the command line. But trying to hack Makefile.am to do the same always ends up with a roundabout veto by the Autotools (i.e., the tools fall back on their preferred method of linking).

Of course, it would be really sweet if I could modify my existing autotools setup to do what I need. I am still diligently researching this possibility. I certainly do not wish to re-tool the whole build system into a hand-crafted, manually maintained Makefile.

gcfuse, With Executable Support

I upgraded my gcfuse utility tonight. The main change was to expose the primary game executable file when browsing a GameCube filesystem. The primary executable is stored as an implicit part of the filesystem, separate from the directory structure. Being able to easily read this file is a useful feature if, for example, someone wishes to get at these executables for the purpose of disassembly.

For example, when mounting the first disc image of one of my few GC games that I have actually completed, Metal Gear Solid:

$ ls -al mount/
total 1
dr-xr-xr-x 4 melanson users         0 Jul 15  2005 .
drwxr-xr-x 7 melanson users       760 Aug 26 21:48 ..
-r--r--r-- 1 melanson users        95 Jul 15  2005 .metadata
dr-xr-xr-x 4 melanson users         0 Jul 15  2005 audio
-r--r--r-- 1 melanson users 426387456 Jul 15  2005 demo.dat
-r--r--r-- 1 melanson users   1988128 Jul 15  2005 metal-gear-solid-the-twin-snakes-exe.dol
-r--r--r-- 1 melanson users      6496 Jul 15  2005 opening.bnr
dr-xr-xr-x 3 melanson users         0 Jul 15  2005 shared
-r--r--r-- 1 melanson users 198715392 Jul 15  2005 stage.dat

The executable file is metal-gear-solid-the-twin-snakes-exe.dol. The filename is a little long, which can happen since it is derived from the game title in the disc metadata, which can be nearly 1000 characters long. The GC executable format is known as DOL, probably short for Dolphin which was the codename of the GameCube during development.

I recognize that I’m likely the only person on the planet who cares about this utility but, hey, it’s my blog and what are blogs for if not to tell the world about the tedious minutiae of an individual’s life?

Related post: