Category Archives: Reverse Engineering

Brainstorming and case studies relating to craft of software reverse engineering.

Hachoir And RealMedia

I finally got something semi-useful accomplished on the Hachoir project: A RealMedia (.rm) file parser. Here is a screenshot of the hachoir-urwid frontend (one of several frontends available for the project):


Hachoir parsing a RealMedia file

I have long held an interest in thoroughly and usefully documenting the rm format which has always struck me as one of the most ad-hoc multimedia formats available, at least in terms of the support available in open source programs. I was eager to write this parser to help me study the format and write down all of the things that are present in various open source demuxers but are currently missing from any public documentation (that I am aware of). The parser I have written so far is the easy stuff; I want to move on to documenting that type-specific data field highlighted in the screenshot, which is the really interesting and useful part of the format.

But hey, if you would like to help, the code is now in the Hachoir Subversion repository.

Real Linkage Part II

Pursuant to yesterday’s Real Linkage experiment, I decided to repeat the same experiment only using the regular inverse transform as opposed to the one for handling the optimized case of only a non-zero DC coefficient. Thankfully, the results were exactly the same as the DC-only I-transform when the general I-transform is fed a DC-only matrix. A little guru told me that the 169 constant (a.k.a. 132) is also a characteristic of the SVQ3 I-transform. I would like to run some sample vectors through both transforms to see if they arrive at the same output. But I am not sure how to instrument the SVQ3 4×4 I-transform to print before and after data sets.

So, still working on that. Then deciding where else to take this project afterwards.

Real Linkage

I was giddy when I recently learned that there were x86_64 builds of the Real codecs available that had function names inside, if for no other reason than that it might finally provide a good reason to learn x86_64 ASM. But then Benjamin helpfully pointed out that there are .a libraries available for their codecs as well (look for .a files in the current source packages). These are far more interesting, particularly in the context of black box reverse engineering. So I established a little proof of concept experiment.

Continue reading

RISC RE

I sometimes hypothesize about reverse engineering code compiled for alternate (i.e. non-x86) CPU architectures. It makes one question why so much effort is focused on x86 RE (to which the simple and immediate answer is, because all the interesting code is compiled for the x86 architecture). Maybe I’m just enamored at how neat RISC code tends to be, with typical architectures featuring 32-bit instruction words. Writing a disassembler obviously embodies not even a fraction of the complexity of a decent x86 disassembler. Fortunately, the GNU binutils take care the disassembly details already (I recently posted a Wiki page on using objdump, even cross-compiling for non-native architectures). Here is some representative disassembly from a PowerPC ELF binary, for those who have never been exposed:

   16b40:       80 e1 01 14     lwz     r7,276(r1)
   16b44:       7c 09 3a 14     add     r0,r9,r7
   16b48:       7d 3e 00 ae     lbzx    r9,r30,r0
   16b4c:       55 20 e1 3e     rlwinm  r0,r9,28,4,31
   16b50:       48 00 00 08     b       16b58
   16b54:       38 00 00 0f     li      r0,15
   16b58:       2c 0b 00 0f     cmpwi   r11,15
   16b5c:       7c 04 03 78     mr      r4,r0
   16b60:       40 82 00 54     bne-    16bb4
   16b64:       88 19 00 02     lbz     r0,2(r25)

Quite a change from the typical x86 slop. Though I sometimes wonder what the ‘reduced’ in reduced instruction set computer (RISC) is really supposed to mean. It definitely doesn’t indicate reduced functionality for individual instructions. I looked up that rlwinm instruction: Rotate Left Word Immediate Then AND with Mask. I started to wonder if it would be simpler to compose an assembly re-targeter for a RISC CPU until I started reading up on this instruction.

And here’s some MIPS RISC code:

  20157c:       84820002        lh      v0,2(a0)
  201580:       2484000a        addiu   a0,a0,10
  201584:       44820000        mtc1    v0,$f0
  201588:       46800020        cvt.s.w $f0,$f0
  20158c:       46010002        mul.s   $f0,$f0,$f1
  201590:       e4600000        swc1    $f0,0(v1)
  201594:       0501fff2        bgez    t0,0x201560
  201598:       24630008        addiu   v1,v1,8
  20159c:       1000000a        b       0x2015c8
  2015a0:       3c020000        lui     v0,0x0

As memory serves, with MIPS CPUs, you get the added fun of manually tracking in your brain the CPU pipelining. I.e., an arithmetic operation from one instruction may not be completed by the next instruction, which happens to operate on the same register, and the compiler was specifically counting on that, and you need to count on it as well during your RE efforts.