“Unnamed RE Project” is the impromptu name I gave to a program that I hastily wanted to start but couldn’t be bothered to come up with even a quasi-clever name. Moreover, I actually got it to do something. I can’t believe I actually made a go of this, perhaps one of the most useless reverse engineering exercises.
Aside: Does this still qualify for my “outlandish brainstorms” blog category if I actually made it work?
The basic idea is one that a lot of reverse engineers surely kick around at some point: A set of CPU registers can be abstracted as a set of global C program variables and individual assembly language instructions map quite neatly onto C program statements. Thus, what about an automatic conversion utility that can take an ASM disassembly and convert it into a C program that can be portably compiled? Not optimal, but it might be a start for other RE projects.
Traditionally, I objected to this approach on the basis of its inherent impurity– one of my objectives in this RE journey is to understand the algorithms being recovered. Technically, while it sounded like a simple enough concept, when one actually sits down to think about, all kinds of problems crop up. One of the most immediate is how case statements (jumps using dynamic tables) would be handled.
Putting aside all uncertainty, I decided to go for it and see what could happen. Believe it or not, I met with some success while also discovering a number of problems I hadn’t yet realized (for example, the dream of portability goes right out the window). I hope to write up some more about this shortly. But for tonight, I will just show the results of the first experiment.
This is the static disassembly of one of my favorite little RE puzzles, a simple bitstream reader:
DEF0 55 push ebp DEF1 8BEC mov ebp, esp DEF3 53 push ebx DEF4 56 push esi DEF5 57 push edi DEF6 8B7D08 mov edi, dword[ebp+08] DEF9 8B750C mov esi, dword[ebp+0C] DEFC 8B4F04 mov ecx, dword[edi+04] DEFF 8B5F10 mov ebx, dword[edi+10] DF02 8BD1 mov edx, ecx DF04 83E107 and ecx, 00000007 DF07 C1EA03 shr edx, 03 DF0A 8B0413 mov eax, dword[ebx+edx] DF0D 0FC8 bswap eax DF0F D3E0 shl eax, cl DF11 B920000000 mov ecx, 00000020 DF16 2BCE sub ecx, esi DF18 D3E8 shr eax, cl DF1A 017704 add dword[edi+04], esi DF1D 5F pop edi DF1E 5E pop esi DF1F 5B pop ebx DF20 5D pop ebp DF21 C3 ret
Note that is doesn’t involve any branching logic (the ‘ret’ notwithstanding), which limited the scope of the experiment for the time being. This is the automatically translated C output (which uses some register and stack abstraction infrastructure not shown):
And I’ll have you know that this simple experiment worked! I wrote a test program that uses both the original opcode stream and the translated C function and they both produced the same output.