Experiments In Software Reverse Engineering
by Mike Melanson (mike at multimedia.cx)
Updated June 29, 2004
Disassembling A Binary
There are many disassemblers out there that take binaries of various formats
and transform them into a series of representative assembly language
instructions. My personal favorite disassembler is Sang Cho's Disassembler,
available
here and henceforth referred to as 'scd'. It only operates on Win32 PE
files (modern Windows executables and DLLs). But most interesting RE targets
are in that format anyway.
The tool compiles fine under Linux using gcc. Download and unpack the
source package and execute:
gcc *.c -o scd
Note that this works "out of the box" with v0.23 of the program. Later programs
may be different.
Depending on which version of gcc you are using, you also may have to convert
carriage returns in the source files using dos2unix. Install scd in a
convenient location, like /usr/local/bin, and disassemble a Win32 binary using:
scd binary.exe > binary.exe.txt
scd binary.dll > binary.exe.txt
Experiment: Finding Functions
A single binary typically has a lot of code. A vast chunk of the code may
do stuff that you don't necessarily care about. It is useful to break the
disassembly down into smaller pieces for analysis. Consider that the original
source code from which the binary was compiled was probably a series of
functions (if the original programmers had any idea what they were doing).
When the code was compiled, the functions were still grouped together. scd
actually does a pretty good job of sorting out where functions begin and
end based on the binary's exported function list as well as the call
instructions in the code.
scd-addresses.pl is a Perl script that
looks at a scd listing and automatically determines (or makes its best
guess) where all of the functions begin and end. Another script,
objdump-addresses.pl, finds the
address boundaries from the output of the standard GNU objdump utility.
Experiment: Execution Profiling
In software development, profiling is used to determine where a program is
actually spending its time. When RE'ing a binary module, it is also useful
to determine where execution is going. This allows you to focus your RE
efforts on the code that is actually used to accomplish a certain task.
Profiling tools are plentiful if you have access to a program's source code
or debug builds. It's a little harder if you have a binary with no debugging
information. It takes some special hacking to gain data on this matter.
If execution data can be obtained, it can be correlated with function
boundary information gathered in the previous experiment in order to describe
where a binary module goes when called upon to perform a particular task,
and how long it stays there.
Future Experiments
Some ideas for future experiments:
- Generating a call tree for a binary; I have written a Perl script which
generates a call tree based on a scd listing but it is not quite complete.
- Apply some automated RE concepts from the doctoral thesis written by
Cristina Cifuentes,
Patron Saint of Software Reverse Engineering.
- Track called addresses during profiling to gain better function
boundary granularity (scd isn't always 100% accurate at finding the
boundaries).
- Extend the profiler to monitor dynamic jumps (e.g., "jmp edx") in order
to gain an idea of where execution goes in those cases.
- Extract data from key locations during execution; this will aid in
algorithm re-implementation.
Return to the main page