Experiments In Software Reverse Engineering

by Mike Melanson (mike at multimedia.cx)
Updated June 29, 2004

Disassembling A Binary

There are many disassemblers out there that take binaries of various formats and transform them into a series of representative assembly language instructions. My personal favorite disassembler is Sang Cho's Disassembler, available here and henceforth referred to as 'scd'. It only operates on Win32 PE files (modern Windows executables and DLLs). But most interesting RE targets are in that format anyway. The tool compiles fine under Linux using gcc. Download and unpack the source package and execute:

gcc *.c -o scd

Note that this works "out of the box" with v0.23 of the program. Later programs may be different. Depending on which version of gcc you are using, you also may have to convert carriage returns in the source files using dos2unix. Install scd in a convenient location, like /usr/local/bin, and disassemble a Win32 binary using:

scd binary.exe > binary.exe.txt
scd binary.dll > binary.exe.txt

Experiment: Finding Functions

A single binary typically has a lot of code. A vast chunk of the code may do stuff that you don't necessarily care about. It is useful to break the disassembly down into smaller pieces for analysis. Consider that the original source code from which the binary was compiled was probably a series of functions (if the original programmers had any idea what they were doing). When the code was compiled, the functions were still grouped together. scd actually does a pretty good job of sorting out where functions begin and end based on the binary's exported function list as well as the call instructions in the code.

scd-addresses.pl is a Perl script that looks at a scd listing and automatically determines (or makes its best guess) where all of the functions begin and end. Another script, objdump-addresses.pl, finds the address boundaries from the output of the standard GNU objdump utility.

Experiment: Execution Profiling

In software development, profiling is used to determine where a program is actually spending its time. When RE'ing a binary module, it is also useful to determine where execution is going. This allows you to focus your RE efforts on the code that is actually used to accomplish a certain task.

Profiling tools are plentiful if you have access to a program's source code or debug builds. It's a little harder if you have a binary with no debugging information. It takes some special hacking to gain data on this matter.

If execution data can be obtained, it can be correlated with function boundary information gathered in the previous experiment in order to describe where a binary module goes when called upon to perform a particular task, and how long it stays there.

Future Experiments

Some ideas for future experiments:

Generating a call tree for a binary; I have written a Perl script which generates a call tree based on a scd listing but it is not quite complete.
Apply some automated RE concepts from the doctoral thesis written by Cristina Cifuentes, Patron Saint of Software Reverse Engineering.
Track called addresses during profiling to gain better function boundary granularity (scd isn't always 100% accurate at finding the boundaries).
Extend the profiler to monitor dynamic jumps (e.g., "jmp edx") in order to gain an idea of where execution goes in those cases.
Extract data from key locations during execution; this will aid in algorithm re-implementation.

Return to the main page