More Crazy RE Experiments – call/ret | Breaking Eggs And Making Omelettes

I have been at it again, concocting more highly specialized reverse engineering experiments. If you have been reading my blog for awhile and are familiar with my methods, or lack thereof, you know I like to try random stuff in the hope that I may accidently come across a good idea.

So we have these WinCE binary modules with debug symbols that implement various advanced Microsoft media codecs. Then we also have the Microsoft media modules that Linspire licensed (provided you know where to find them) that also have copious debug symbols. I wanted to put this intelligence to good use.

But which set of codecs to investigate? Honestly, they both are likely based on the same source code. However, what can actually be done to understand what goes on in either binary set? I consistently came to a dead-end with the WinCE set because I did not have a good starting point– I did not know how the external interface to a specific module operated. On the other hand, I knew exactly how the external API works for the Linspire modules and I documented it here.

Drawing on some older experiments I have performed, I set up an experiment where I used x86 trap facilities to monitor each instruction being executed and log every call or ret(urn) instruction encountered, along with its address. These Linspire modules are shared objects which means that the binary code is relocatable. I.e., you can disassemble the .so file with objdump and obtain a list of functions and their addresses, but those addresses will not match up with the monitored executed instructions which were relocated by the operating system. This problem is solved easily enough by knowing that all the binary code is loaded into memory in one contiguous block and that there is a constant offset that can be subtracted from the monitored instructions. All of a sudden, the function addresses from the monitored instructions match up with the function addresses from the static disassembly.

Why would this be such a useful experiment vs., e.g., static disassembly analysis? This approach examines code paths that are actually executed for a given set of data. There is a lot that can not be ascertained from static analysis. For example, this codec has an inordinate number of “call *%ebx” instructions (likely due to underlying C++ code). Good luck guessing the target of instructions like that without actually tracing into the code, either manually (with a debugger) or automatically (with a custom tool such as the one I developed for this task).

I ran the experiment using Linspire’s libwmv2.so module and met with impressive success. First, I modified the FFmpeg/libavcodec API code to trap all the calls and rets that occur from the time particular functions are called to the time they return. I collected data for the wmv2init function as well as the wmv2packet for a particular frame of WMV2 data for which we do not understand the complete coding method. Then, I wrote another one of my ridiculously specialized Perl scripts for processing data from my nutty RE experiments. This script analyzed the call/ret address data and matched it up against the original static disassembly to output a hierarchical/tree view in text format.

Since I know you are interested to see some results from this experiment, check this out:

Some problems at this point:

no external library calls are identified so you see a lot of (no symbol) entries; fortunately, these libraries do not rely very heavily on external functions beyond such common fare as malloc and free
the graph needs to be collapsed in some manner; one function is called 4099 times during codec initialization. However, I cannot, for example, only follow a function once and then count its subsequent recurrences. Subsequent calls to a function can follow different paths which can lead to other functions. Still, it would be useful to be able to analyze when functions follow the same call path (or are leaf functions) and not note every occurrence in the final hierarchy.
the output is not that pretty; I have had several recommendations for Graphviz to create pretty graphs. I am not sure which type of graph I should choose. The tree could certainly use some redundancy removal before graphing. Another option is to output some kind of C-looking code file. Then, certain editors like KDE’s Kate can collapse and expand parts of the tree based on user input.

If anyone is interested, I will post the code and precise methodology for the experiment. I will probably do so eventually, anyway, but right now, the whole experiment is spread up into so many different parts that it would be difficult to follow.