assembly language | Breaking Eggs And Making Omelettes

Here’s the complete, do-it-yourself instructions and code for that re-targeting experiment. First, the files:

unnamed-re-project.py, which is the re-targeter, and outputs C code that relies on the support files:
asm2c.c and
asm2c.h; these are compiled along with
testbench.c to demonstrate the program
function.txt contains the disassembly that the re-targeter is hardcoded to process

The re-targeter wants to process code like that found in function.txt. This is the disassembly format output by my favorite Win32 PE disassembler, Sang Cho’s Disassembler. I knew in advance that the function expects 4 parameters, and that fact is hardcoded in the re-targeter along with the file name. The testbench.c file contains the opcodes for the original function and allows the programmer to switch between the original opcodes and the re-targeted code for verification.

To run this experiment:

download all 5 files into one (Unix, x86) directory
./unnamed-re-project.py > bitreader.c
gcc *.c -o testbench

The testbench.c program simulates the data structure that the re-targeted bitreading function expects, along with a bitstream that looks like 0xA5, 0x5A repeated. Running the program should result in:

 12 bits = A55
  4 bits = A

If compiling on x86_32, you can switch the “#if 0” to “#if 1” in order to test the original opcodes.

I took a stab at making the re-targeter portable to a big endian platform, as brainstormed last night. However, I soon realized what my hastily scrawled, year-old note about that task’s difficulty must have warned about– the outlined approach works for source arguments, but is not as straightforward for destination arguments.

I don’t have immediate access to an x86_64/Linux environment, but I would like to know if the re-targeted code compiles on that platform. The entire point of a re-targeter is to run code on a different platform, though I suppose alternate operating systems on the same CPU architecture is another interpretation.

So, of course the re-targeter is super-naive in its current form. It only implements enough instructions to handle that one function in function.txt. It does not handle branching at all. In order to do so, each of the instruction emitters would also need to output the code that adjusts the appropriate flags after each arithmetic instruction. Then, a branch would map to a simple goto with the correct address label.

Related Posts

It was nearly a year ago that I tried my hand at writing a re-targeter — a program that can take machine opcodes and automatically translate them into a portable C program, which certainly sounds simple and intuitive enough. I was really quite busy last year about this time and I don’t remember how I found time for the re-targeter experiment in the first place. But it looks like I had time to write up some notes that I never fleshed out and published. It was hard enough just to locate the old source code. I was completely surprised to find that I had actually managed to write the re-targeter in Python; I had no idea I knew so much of that language (which, granted, isn’t much).

Here are some of the problems I encountered when I took a stab at writing a re-targeter; let’s see if I can remember the specifics a year later:

Continue reading →

Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering

Tag Archives: assembly language

Naive x86 Re-targeter

Implementing The Re-targeter