Automated Memory Excavation | Breaking Eggs And Making Omelettes

Sometimes, multimedia programs are modular which facilitates reverse engineering. But what if they are not modular or are just standalone programs that do one thing well?

Many multimedia program architectures are highly modular in order to allow easy expandability for new codecs and container formats. Since the modules conform to well-defined interfaces, usually exporting one or more functions from a standard API, such modular platforms provide an excellent jumping-off point for reverse engineering. But what happens when you encounter a standalone player where all of the decoding code is hidden in there somewhere amidst megabytes of unrelated code? Or, in the case of Apple’s QuickTime Player, many of the core codecs are rolled up into one mega-DLL?

In situations like this, I like to use a chase-the-data-through-the-buffers strategy. I will need to come up with a better name for that. It operates on the following simple logical premise:

Somewhere in this big, binary module, there is the entry point to a decoding function, a decoding function that I want.
One of the parameters to that decoding function will need to be a pointer to encoded data.
Question: Where does that data come from? Answer: On disk.
Solution: Intercept the data when it is read from the disk.

I once had access to a debugger that could be made to stop when a particular area of memory was read from. I would put a breakpoint on the standard Windows ReadFile() function. I would wait until I recognized the buffer of data being read from the file (since I had the sample file open in a hex editor in a different window). Then I would put a read breakpoint over the entire data buffer which would lead me straight to the core decoding function.

I do not have access to that debugger anymore and it seems to be a fairly rare feature in modern debuggers. Besides, using debuggers just encourages one to step through the code. I want cool software tools that do tedious work for me.

So, let’s brainstorm a new program. Operating on a similar logical premise as above, feed it a buffer pattern you wish to match. The pattern would be, for example, the first 16 bytes of the first keyframe of a file that contains data in a format that is still only decodable by a binary codec. The program would then set the x86 CPU trap flag, so that every instruction causes a breakpoint handler to be invoked, and the custom breakpoint handler would search memory for that pattern.

Seems like a big task. I think I would break it down like this:

wait for data to be moved into a register
if new data in register appears to be a reasonable data memory location (need some heuristics on this one), start doing a pattern match starting at that address
log to a file every match (likely more than one occurrence), and the address of the mov instruction that placed the data buffer into a register
log to a separate file the last memory location that was about to be checked– use this intelligence on subsequent runs after the profiling program has accessed illegal memory

Nutty, and not something I am likely to get started on soon.