Reverse engineering an algorithm from binary code is tough enough. However, there is a larger issue of validation. One idea I have been thinking about for awhile is some method of hooking into an RE target during runtime and trapping data as it goes in and out of a particular function. The collected data would later be used as test vectors for the new implementation. However, it also occurred to me that this method could also be the RE tool itself. For example, if you are pretty sure that a particular piece of binary code operates as an inverse DCT, use the previously described method to observe data coming in and out. This can save you some trouble of tracing through a tedious stretch of code to determine that it actually is an IDCT. Plus, you can figure out if it is identical to, e.g., the standard MPEG/JPEG IDCT.
Colleague Benjamin Larsson noted that this would be referred to as black box reverse engineering.
Another basic application of this technique would be to monitor the bitstream parsing function for a given input bitstream. Many multimedia decoders delegate all of their bitstream parsing duties to a small number of functions and this would be a great way to validate that a new decoder is chopping up a bitstream in the correct manner.
How to accomplish this? I recently sat down and actually read through the entire GNU Debugger manual to see what interesting features I might have been missing all these years. I discovered tracepoints. These apparently let you gather data about a program without stopping the program periodically. Unfortunately, I don’t think the facility is flexible enough to do what I outline above.
Are there tools that can already do what is described here? Or will it take some custom tools? If it takes custom tools, I already have a head start with some of my experiments.
There is also something called “relative debugging”. An ex Phd
did some research on this. An academic here still maintains work on this…
http://www.csse.monash.edu.au/~davida/guard/
I have not looked at this too much, but it might be interesting.
Ex-Ph.D.? :) Was he stripped of his degree for some reason?
Thanks for the link; will investigate.
Well it all depends on the algorithm.
When i reverse unknown algorithms i often translate
line by line to C/C++.
Usually starting with the innermost functions.
To do validation i use either debugging or DLL injection.
DLL injection is really powerful if you are on the win32(NT)
platform.
Here you could dynamicly add a hook to a function and trap data as it goes thru it.
This works because the DLL will get injected into the same
address space as the program, thus you can directly edit
the binary code of the program.
Most of the time i usually setup a new thread from the DLL.
So i can call isolated functions without disturbing the program.
This is great for doing test cases.
/Legion