Category Archives: Reverse Engineering

Brainstorming and case studies relating to craft of software reverse engineering.

Why So Many?

A multimedia colleague posed the quandary: “why are there 13 different lossless formats out there?” My best answer: Because there were at least 13 organizations or individuals that wanted their own flavor. We will probably discover that the underlying algorithms for all 13 are nearly indistingishable, just with slightly tweaked parameters. Indeed, the individual who reverse engineered Apple’s ALAC figured out portions by first understanding similar portions of FLAC.

I once reverse engineered an audio decoder from binary code only to find that is was a stock IMA ADPCM decoder. I didn’t see how it offered any advantage whatsoever over another available, free solution. I later had the opportunity to talk to someone involved with this variant’s creation. I asked why they chose to create their own format since it brought nothing new to the table; did they just want have their own format for the sake of it?

The response: “Doesn’t everybody?”

OpenRCE

Pursuant to my last post of black box reverse engineering, a piece of sliverware (hi spoon!) emailed me and tipped me off to various programs hosted at a site called OpenRCE. The RCE stands for reverse code engineering. Nice logo, too:


OpenRCE logo

I do appreciate it when people clue me into other resources out there dedicated to the fine art of reverse engineering. The articles and utilities hosted at OpenRCE appear to be a little more focused towards understanding malware which is a domain with somewhat different characteristics than multimedia, but certainly no less challenging.

In particular, my attention was directed to a Python-based RE framework called Pai Mei as well as another tool named Process Stalker.

Related post:

Black Box Reverse Engineering

Reverse engineering an algorithm from binary code is tough enough. However, there is a larger issue of validation. One idea I have been thinking about for awhile is some method of hooking into an RE target during runtime and trapping data as it goes in and out of a particular function. The collected data would later be used as test vectors for the new implementation. However, it also occurred to me that this method could also be the RE tool itself. For example, if you are pretty sure that a particular piece of binary code operates as an inverse DCT, use the previously described method to observe data coming in and out. This can save you some trouble of tracing through a tedious stretch of code to determine that it actually is an IDCT. Plus, you can figure out if it is identical to, e.g., the standard MPEG/JPEG IDCT.

Colleague Benjamin Larsson noted that this would be referred to as black box reverse engineering.


Black box

Another basic application of this technique would be to monitor the bitstream parsing function for a given input bitstream. Many multimedia decoders delegate all of their bitstream parsing duties to a small number of functions and this would be a great way to validate that a new decoder is chopping up a bitstream in the correct manner.

How to accomplish this? I recently sat down and actually read through the entire GNU Debugger manual to see what interesting features I might have been missing all these years. I discovered tracepoints. These apparently let you gather data about a program without stopping the program periodically. Unfortunately, I don’t think the facility is flexible enough to do what I outline above.

Are there tools that can already do what is described here? Or will it take some custom tools? If it takes custom tools, I already have a head start with some of my experiments.