Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


Archives:

Meta:

Variable Declaration Guidelines

August 31st, 2006 by Multimedia Mike

Back in 2000, I came across this Advogato article about proper coding guidelines for the coming wave of 64-bit machines. The most interesting part, I thought, was comment #2 ("C is portable, if you let it be") which offers some very sane guidelines for declaring variable types to just allow the compiler to do its job effectively. This is why I usually just declare int's for numbers rather than uint32_t's everywhere. There is often no reason to try to force particular types.

Don't think that you're saving space by declaring a uint8_t instead of an int-- chances are that you aren't. I've disassembled enough C code compiled into 32-bit x86 machine code to know that a compiler will usually allocate 32 bits for that 8-bit variable. In fact, here is a small piece of code to drive the point home:

stack.c:

C:
  1. #include <stdio.h>
  2. #include <stdlib.h>
  3.  
  4. int main()
  5. {
  6.   char a;
  7.   short b;
  8.  
  9.   a = random();
  10.   b = a + 1;
  11.  
  12.   printf("a = %d, b = %d\n", a, b);
  13.  
  14.   return 0;
  15. }

Compile with: gcc -Wall stack.c -o stack
Disassemble with: objdump -d -Mintel stack
Key parts:

080483a0 < main >:
 80483a0:   55                  push   ebp
 80483a1:   89 e5               mov    ebp,esp
 80483a3:   83 ec 08            sub    esp,0x8
 80483a6:   83 e4 f0            and    esp,0xfffffff0
 80483a9:   b8 00 00 00 00      mov    eax,0x0
 80483ae:   29 c4               sub    esp,eax
 80483b0:   e8 07 ff ff ff      call   80482bc < random @plt >
 80483b5:   88 45 ff            mov    BYTE PTR [ebp-1],al
 80483b8:   66 0f be 45 ff      movsx  ax,BYTE PTR [ebp-1]
 80483bd:   40                  inc    eax
 80483be:   66 89 45 fc         mov    WORD PTR [ebp-4],ax

Notice that, despite strictly needing only 3 bytes of local variable storage, 8 bytes were allocated from the stack. 32-bit machines like the i386 really, really like dealing with 32-bit quantities.

Posted in Programming | 2 Comments »

GameCube CVS

August 30th, 2006 by Multimedia Mike

When I started mounting GameCube disc images with my gcfuse utility, perhaps the strangest thing I found (apart from 11,500+ Ogg Vorbis files on one title) was intact CVS directory structures on a number of discs. Of course, CVS directories don't give away much interesting detail; it's not like there's leaked source code living inside. Perhaps the most interesting thing is comparing the CVSROOT strings to information found in the MobyGames database. So, Yasunari Soejima, Hiroki Sotoike, and Fumihisa Sato: I just wanted to helpfully point out that you neglected to delete the CVS directories before creating the final disc images for certain GameCube games.


Magnifying glass

Posted in Game Hacking, Nintendo | 1 Comment »

OpenRCE

August 29th, 2006 by Multimedia Mike

Pursuant to my last post of black box reverse engineering, a piece of sliverware (hi spoon!) emailed me and tipped me off to various programs hosted at a site called OpenRCE. The RCE stands for reverse code engineering. Nice logo, too:


OpenRCE logo

I do appreciate it when people clue me into other resources out there dedicated to the fine art of reverse engineering. The articles and utilities hosted at OpenRCE appear to be a little more focused towards understanding malware which is a domain with somewhat different characteristics than multimedia, but certainly no less challenging.

In particular, my attention was directed to a Python-based RE framework called Pai Mei as well as another tool named Process Stalker.

Related post:

Posted in Reverse Engineering | No Comments »

Black Box Reverse Engineering

August 28th, 2006 by Multimedia Mike

Reverse engineering an algorithm from binary code is tough enough. However, there is a larger issue of validation. One idea I have been thinking about for awhile is some method of hooking into an RE target during runtime and trapping data as it goes in and out of a particular function. The collected data would later be used as test vectors for the new implementation. However, it also occurred to me that this method could also be the RE tool itself. For example, if you are pretty sure that a particular piece of binary code operates as an inverse DCT, use the previously described method to observe data coming in and out. This can save you some trouble of tracing through a tedious stretch of code to determine that it actually is an IDCT. Plus, you can figure out if it is identical to, e.g., the standard MPEG/JPEG IDCT.

Colleague Benjamin Larsson noted that this would be referred to as black box reverse engineering.


Black box

Another basic application of this technique would be to monitor the bitstream parsing function for a given input bitstream. Many multimedia decoders delegate all of their bitstream parsing duties to a small number of functions and this would be a great way to validate that a new decoder is chopping up a bitstream in the correct manner.

How to accomplish this? I recently sat down and actually read through the entire GNU Debugger manual to see what interesting features I might have been missing all these years. I discovered tracepoints. These apparently let you gather data about a program without stopping the program periodically. Unfortunately, I don't think the facility is flexible enough to do what I outline above.

Are there tools that can already do what is described here? Or will it take some custom tools? If it takes custom tools, I already have a head start with some of my experiments.

Posted in Reverse Engineering | 3 Comments »

Evaluating Alternate Build Systems

August 27th, 2006 by Multimedia Mike

Even though I am on record as expressing devotion to the Autotools suite, I am not averse to evaluating alternatives. Mostly, I'm interested in a competent build system that can take care of the difficult and tedious stuff pertaining to a build such as dependencies and configuration. I acknowledge that Autotools embody a fair amount of complexity and arcana. The two top contenders to plausibly compete for Autotools' title appear to be SCons and CMake.


Components

A good baseline for evaluating the capabilities of an alternative is to find a limitation of your current solution and then figure out if the alternative can do that AND everything that the current solution can do. For example, on one of my software projects, I really appreciate that the current Autotools-based solution can:

  • automatically keep track of dependencies
  • manage multiple build targets
  • create multiple build configurations in separate directories, working from the same source tree

But now I need some very fine tweaking of certain build settings, such as being able to static link a particular version of libstdc++ to a binary. I don't know if any of the common build systems support this without some very serious hacking.

Here is a blog post from someone who has struggled with the very same issues and was able to solve the problem with a hand-crafted Makefile: G Plus Plus Minus One. I have managed to achieve the correct results from the command line. But trying to hack Makefile.am to do the same always ends up with a roundabout veto by the Autotools (i.e., the tools fall back on their preferred method of linking).

Of course, it would be really sweet if I could modify my existing autotools setup to do what I need. I am still diligently researching this possibility. I certainly do not wish to re-tool the whole build system into a hand-crafted, manually maintained Makefile.

Posted in Programming | No Comments »

gcfuse, With Executable Support

August 26th, 2006 by Multimedia Mike

I upgraded my gcfuse utility tonight. The main change was to expose the primary game executable file when browsing a GameCube filesystem. The primary executable is stored as an implicit part of the filesystem, separate from the directory structure. Being able to easily read this file is a useful feature if, for example, someone wishes to get at these executables for the purpose of disassembly.

For example, when mounting the first disc image of one of my few GC games that I have actually completed, Metal Gear Solid:

$ ls -al mount/
total 1
dr-xr-xr-x 4 melanson users         0 Jul 15  2005 .
drwxr-xr-x 7 melanson users       760 Aug 26 21:48 ..
-r--r--r-- 1 melanson users        95 Jul 15  2005 .metadata
dr-xr-xr-x 4 melanson users         0 Jul 15  2005 audio
-r--r--r-- 1 melanson users 426387456 Jul 15  2005 demo.dat
-r--r--r-- 1 melanson users   1988128 Jul 15  2005 metal-gear-solid-the-twin-snakes-exe.dol
-r--r--r-- 1 melanson users      6496 Jul 15  2005 opening.bnr
dr-xr-xr-x 3 melanson users         0 Jul 15  2005 shared
-r--r--r-- 1 melanson users 198715392 Jul 15  2005 stage.dat

The executable file is metal-gear-solid-the-twin-snakes-exe.dol. The filename is a little long, which can happen since it is derived from the game title in the disc metadata, which can be nearly 1000 characters long. The GC executable format is known as DOL, probably short for Dolphin which was the codename of the GameCube during development.

I recognize that I'm likely the only person on the planet who cares about this utility but, hey, it's my blog and what are blogs for if not to tell the world about the tedious minutiae of an individual's life?

Related post:

Posted in Game Hacking, Nintendo, Open Source Multimedia | 2 Comments »

8088 Corruption Redux

August 25th, 2006 by Multimedia Mike

Trixter notified me that people just won't leave his 8088 Corruption exercise alone. One coder has even made a player for the Game Boy Advance. I finally made a MultimediaWiki page for the data format in case anyone wants to carry this further.

Posted in Open Source Multimedia | No Comments »

CD Detection Experiment

August 24th, 2006 by Multimedia Mike

Some years ago, I wrote a program named the CD Detection Experiment to toy around with compact discs. I reference this tool often in my Multimedia Exploration Journal. The end goal was a tool that could analyze a variety of different CD types, including mode 2 discs, Philips CD-I discs, and 3DO discs. All it does is open a raw CD device under Unix, check the first few sectors, and decide what kind of CD is in the drive. I'm trying to remember if this tool preceded my CD-related work for xine or if I used code from my xine work in this program. There is code for Solaris and FreeBSD reading, so I am guessing that I based this on the xine work.


CDCDCD

Anyway, if you care, here is the code. Here is some sample output, from that Deathtrap Dungeon game discussed in this journal entry, the one with a weird mixed mode:

$ ./cdexp /dev/cdrom
CD Detection Experiment
toc:
  first track = 1
   last track = 18

toc entries:
leadout track: MSF: 76:16:18, first frame = 343218
track  1,  data, MSF: 00:02:00, first frame = 150
 mode 1 data
 iso9660 fs signature found
 system id =
 volume id = Deathtrap
track  2, audio, MSF: 28:09:14, first frame = 126689
track  3, audio, MSF: 30:37:58, first frame = 137833
track  4, audio, MSF: 32:37:28, first frame = 146803
track  5, audio, MSF: 35:32:30, first frame = 159930
track  6, audio, MSF: 39:57:58, first frame = 179833
track  7, audio, MSF: 43:07:24, first frame = 194049
track  8, audio, MSF: 45:45:35, first frame = 205910
track  9, audio, MSF: 49:33:19, first frame = 222994
track 10, audio, MSF: 52:38:22, first frame = 236872
track 11, audio, MSF: 55:17:04, first frame = 248779
track 12, audio, MSF: 58:13:66, first frame = 262041
track 13, audio, MSF: 60:28:28, first frame = 272128
track 14, audio, MSF: 62:43:43, first frame = 282268
track 15, audio, MSF: 64:59:33, first frame = 292458
track 16, audio, MSF: 67:14:00, first frame = 302550
track 17, audio, MSF: 75:36:00, first frame = 340200
track 18,  data, MSF: 76:00:00, first frame = 342000
 mode 1 data
 iso9660 fs signature found
 system id =
 volume id = Deathtrap

Posted in Open Source Multimedia | No Comments »

Mounting An Executable

August 23rd, 2006 by Multimedia Mike

I was studying the Executable & Linking Format (ELF) recently. I realized how hierarchically it is organized. Nowadays, whenever I think of something hierarchical, for some reason, I think of cramming it into a filesystem structure via FUSE. Imagine mounting an executable file as a filesystem. One directory could have a list of exported function names. When reading those files, it would automatically disassemble that section of the file.


Executable

I'm working off of the 'readelf -a' command here. There would be a directory at the top level called sections/ and would contain

  .interp/
  .hash/
  .dynsym/
  .dynstr/

And so on. It might be a little tricky because those names have dots in front of them. Another directory can list shared libraries and have symbolic links to the correct libraries. Another directory will list the exported, public symbols. Opening these files would disassemble the functions for display in whatever text editor you want. Of course, not all of the interesting stuff is found at the public entry points, so it will be necessary to employ heuristics to locate other, private function entry points.

For bonus points, make the filesystem writable. This will allow annotations in the disassembled source. This will probably require that a "work" copy of the binary to be stored in the user's home directory.

Posted in Outlandish Brainstorms | No Comments »

Anti-Spam Upgrade

August 17th, 2006 by Multimedia Mike

I just upgraded my principle blog anti-spam measure, WP-HashCash, to the latest version. I know that some readers have been blocked by this when trying to comment. In fact, I was even blocked recently when I tried to post a comment. Please let me know if WP-HashCash gives you any trouble.

I think that WP-HashCash uses a great idea to stop spambots by issuing a programmatic challenge to the client before accepting a client's comment. This sort of thing has been proposed as a solution for email spam but would not be tractable without modifying the fundamental email protocols. I have never seen this blog nailed by spambots so I can only assume that the plugin is doing its job, which I realize may not be very sound reasoning.

Posted in General | 2 Comments »

« Previous Entries