Category Archives: Reverse Engineering

Brainstorming and case studies relating to craft of software reverse engineering.

What RE Looks Like

People ask me what binary reverse engineering is like. It’s tedious, that’s what it is. When it gets right down to it, you just have to concentrate on little (in the sense that they do not do much individually) instructions and carefully trace the bigger picture from huge blocks of these baby steps.

If you want to see what RE really looks like, come inside my notebook. Here are 3 choice pages, illustrating the process I used to figure out one of the inverse transforms that RealVideo 4 (and 3, for that matter; both H.264 prototypes) employ:


RV40 transform RE notes
page 1, page 2, page 3

In this example, I used a lot of back substitution in order to figure out a series of math formulas. This case was greatly simplified by the fact that there were not very many mystery parameters to deal with. However, the property that complicated matters here is that there were few if any straightforward imul (integer multiplication) instructions. Even though multiplication figures heavily into the transform, most multiplications were performed by sequences of additions, bit shifts, and compound add/shift instructions (lea).

Chocolate Milk Puzzle

So I had a little carton of chocolate milk with my lunch today (longtime readers have guessed that I must still be in elementary school). On one side of the carton was a word search puzzle that challenged me to find 6 words hidden in the jumble of letters. It occurred to me that this represented an allegory for reverse engineering.


Grain group word search puzzle

I generally liken RE to an extraordinarily complex, or just tedious, puzzle. This is a more straightforward puzzle.

Task: Find 6 strings of characters. What strings? I wasn’t sure exactly. Lesser word searches explicitly tell you which words to find. Those puzzles can be solved via a brute force algorithm that most of us figure out by the age of 5. I knew that these had to be coherent English words. Eventually, I re-read the instructions and noticed that the words were supposed to pertain to “foods made from grain”. Lesson: Gather all the intelligence you can before delving into the meat of the puzzle.

Next, there is domain knowledge. I have done a few word search puzzles before and so I know that I am supposed to look for strings of characters running left -> right, right -> left, top -> bottom, bottom -> top, and also diagonally. And I have some clue about foods made from grain. So I am off.

Then, I started theorizing about how the original puzzler (or whatever you call a person who develops puzzles) must have put this puzzle together. Since there are 6 words, I hypothesized that 2 would be horizontal (one for each direction), 2 would be vertical (again, one for each direction), and 2 would be diagonal. This would make good use of the parameters of a stock word search. I refined my initial search using these assumptions.

I was correct about the 2 horizontal and 2 vertical. I also found one more horizontal word. I methodically scanned every diagonal possibility, much to the disdain of my overworked eyes. I could not find any word on that orientation. So I revised my original assumption and assumed the remaining word must also be horizontal.

Finally, there’s the matter of validation: determining if one’s RE results are correct. Unlike binary RE, this puzzle makes it easy. So I check and… I don’t believe it! I only got 4 of them correct. I got another partially correct. And yeah, I didn’t think ‘bean’ was a valid grain.

Compiled Duck

I actually got the original Duck TM1 decoder compiled and hooked up to a small test bench app. It even runs without crashing. However, it produces unequivocally incorrect output:


Bad Duck TrueMotion 1 Decode

I was expecting something more in a solid white for this particular frame. This doesn’t exhibit the normal Duck TM1 bidirectional artifacts that I’m used to seeing. I also wired up a number of different blitting calls with both 16- and 24-bit data, to no avail.

I’ll give the API calls another look to see if I’m missing anything obvious in that department (this approach is unlikely to bear fruit, considering the dearth of documentation). Failing that, I might find myself in the ironic position of using libavcodec’s native Duck TM1 decoder to verify the operation of the original decoder to see what is going wrong with files that are known to work correctly in lavc, so that the original decoder will work completely and yield clues about the large body of problem TM1 files.

Addendum: I decided to enlist Valgrind’s assistance against the adversary. This is the first time I have touched the utility in awhile. Let’s see what it has to say…

  valgrind: the 'impossible' happened:
     Killed by fatal signal

I’m not sure what to think about that. Should I feel proud? Fortunately, leading up to the impossible condition are plenty of other memory errors, all ripe for investigation.

Duck Hieroglyphics

It’s like digital archaeology– understanding the ancient (by internet time) workings of less advanced engineering efforts. Pop culture depictions of archaeology would have us believe that those who toil in the field are searching for that one artifact that would neatly solve the entire puzzle at hand. Historically, perhaps the artifact that came closest to fitting this archetype was the Rosetta Stone, and even that was incomplete (if I’m observing the pictures correctly).

So it is with my current investigation. Much like Sean Connery playing Indiana Jones’ dad searching for the Holy Grail in The Last Crusade, I run the risk of making the Duck TrueMotion 1 algorithm my lifelong obsession. I got that Duck TM1 source code into a state where I could compile it against a standalone program. It only required 38 C source files from the original vpvision source tree and a mess of attendant header files to boot.

Continue reading