Category Archives: Reverse Engineering

Brainstorming and case studies relating to craft of software reverse engineering.

Unnamed RE Project

“Unnamed RE Project” is the impromptu name I gave to a program that I hastily wanted to start but couldn’t be bothered to come up with even a quasi-clever name. Moreover, I actually got it to do something. I can’t believe I actually made a go of this, perhaps one of the most useless reverse engineering exercises.

Aside: Does this still qualify for my “outlandish brainstorms” blog category if I actually made it work?

The basic idea is one that a lot of reverse engineers surely kick around at some point: A set of CPU registers can be abstracted as a set of global C program variables and individual assembly language instructions map quite neatly onto C program statements. Thus, what about an automatic conversion utility that can take an ASM disassembly and convert it into a C program that can be portably compiled? Not optimal, but it might be a start for other RE projects.

Traditionally, I objected to this approach on the basis of its inherent impurity– one of my objectives in this RE journey is to understand the algorithms being recovered. Technically, while it sounded like a simple enough concept, when one actually sits down to think about, all kinds of problems crop up. One of the most immediate is how case statements (jumps using dynamic tables) would be handled.

Putting aside all uncertainty, I decided to go for it and see what could happen. Believe it or not, I met with some success while also discovering a number of problems I hadn’t yet realized (for example, the dream of portability goes right out the window). I hope to write up some more about this shortly. But for tonight, I will just show the results of the first experiment.

Continue reading

Tarantula

Robert and Reynaldo are working on a multimedia investigation utility called Tarantula. The goal, as I understand it, is to be able to analyze frames from a video stream to determine more about their properties, e.g., RGB15 vs. RGB16, upside-down orientation, and other parameters. Sounds like the start of something useful.

The project made me realize that I would like a simple tool that could load a file and allow me to treat it as various different types of PCM. Actually, I can’t believe that there aren’t already a number of waveform editors out there, even for Linux, that can probably do just that. It might make some of my multimedia investigations simpler.

Reverse Engineering Artwork

VAG directed me to some curious visualization of console game disassemblies filed under distellamap. This particular experiment targets Atari games and helps to highlight some of the graphics contained inside. Nearer and dearer to my heart is some of the research upon which this was based called dismap which maps the flow of selected Nintendo Entertainment System games.

Investigating Hachoir

In response to yesterday’s brainstorm, Mjules tipped me off regarding another tool that falls squarely into the “I wish I had thought of that” category– Hachoir (wish I knew how to pronounce it). It’s a Python-based framework for writing file parsers.


Hachoir mascot appliance

Finally! I have a compelling reason to learn Python.*** Python has long been on my list of languages to figure out, along with Prolog. Tonight, I wrote a very basic extension to Hachoir to parse the BIN FMV format discovered in my most recent exploration journal entry. And look– this WordPress plugin for code syntax highlighting also does Python:

Right now, this produces the output:

root (The Amazing Spider-Man vs. The Kingpin (Sega CD) FMV)
0) chunk type= "CONF": FourCC (size 4 bytes)
4) chunk length= 0x00000028: 4 bytes (size 4 bytes)
8) raw[]= "\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0(...)" (size 3.3 MB)
[ q to quit - move with arrows, page up/down, home/end ]

I still have a lot to learn about both Python and the existing framework facilities provided by Hachoir for parsing chunked file formats. The program already includes parsers for an impressive array of file format types. One that is of particular interest to me is a QuickTime file parser that the authors concede is rather incomplete. I see real promise for this parser as a research and troubleshooting tool for one of the most involved multimedia formats available.

*** (Proviso: No disrespect meant to anyone’s favorite language. I’m as fascinated with new programming languages as the next hardcore Linux geek. But it always helps me to learn a new language when I have a clear goal outlined for doing so.)