Here’s the complete, do-it-yourself instructions and code for that re-targeting experiment. First, the files:
- unnamed-re-project.py, which is the re-targeter, and outputs C code that relies on the support files:
- asm2c.c and
- asm2c.h; these are compiled along with
- testbench.c to demonstrate the program
- function.txt contains the disassembly that the re-targeter is hardcoded to process
The re-targeter wants to process code like that found in function.txt. This is the disassembly format output by my favorite Win32 PE disassembler, Sang Cho’s Disassembler. I knew in advance that the function expects 4 parameters, and that fact is hardcoded in the re-targeter along with the file name. The testbench.c file contains the opcodes for the original function and allows the programmer to switch between the original opcodes and the re-targeted code for verification.
To run this experiment:
- download all 5 files into one (Unix, x86) directory
- ./unnamed-re-project.py > bitreader.c
- gcc *.c -o testbench
The testbench.c program simulates the data structure that the re-targeted bitreading function expects, along with a bitstream that looks like 0xA5, 0x5A repeated. Running the program should result in:
12 bits = A55 4 bits = A
If compiling on x86_32, you can switch the “#if 0” to “#if 1” in order to test the original opcodes.
I took a stab at making the re-targeter portable to a big endian platform, as brainstormed last night. However, I soon realized what my hastily scrawled, year-old note about that task’s difficulty must have warned about– the outlined approach works for source arguments, but is not as straightforward for destination arguments.
I don’t have immediate access to an x86_64/Linux environment, but I would like to know if the re-targeted code compiles on that platform. The entire point of a re-targeter is to run code on a different platform, though I suppose alternate operating systems on the same CPU architecture is another interpretation.
So, of course the re-targeter is super-naive in its current form. It only implements enough instructions to handle that one function in function.txt. It does not handle branching at all. In order to do so, each of the instruction emitters would also need to output the code that adjusts the appropriate flags after each arithmetic instruction. Then, a branch would map to a simple goto with the correct address label.