Barrett’s Basic Blocks Are Back | Breaking Eggs And Making Omelettes

Thanks to Sean Barrett for helping me compile his bb86 app. I let it rip on this code snippet from the Unnamed RE Project. It’s interesting stuff. I omitted the push, pop, and ret instructions since basic blocks pertain to linear sequences of load, store, and arithmetic instructions:

$ ./bb86 < ~/basic-block.asm
Reading stdin
Warning: unknown opcode 'bswap' in line 9

Memory locations:
        mem1 EQU dword+(ebp_0)+08
        mem5 EQU dword+(mem4)+((mem3 >> 03))
        mem4 EQU dword+(mem1)+10
        mem3 EQU dword+(mem1)+04
        mem2 EQU dword+(ebp_0)+0c

Integer registers:
  eax = ((mem5 < < cl_0) >> cl_0)
  ebx = mem4
  ecx = (00000020 - mem2)
  edx = (mem3 >> 03)
  esi = mem2
  edi = mem1

Floating point stack:
  st(0) = fp3
  st(1) = fp2
  st(2) = fp1
  st(3) = fp0

Memory locations:
  [dword+(mem1)+04] < = (mem3 + mem2)

I am pretty sure that all of those register states are true at the end of the block, though they are listed in the traditional sequence rather than the logical order. I.e., cl needs to be set before eax could be correct.

I tried out bb86 on a basic block of floating point instructions (using a computation I understand, like the distance between 2 points, rather than a Fourier transform), and it was less than successful (crash). But I can not fault the program since I am feeding it data disassembled by objdump (-Mintel) rather than Microsoft's official format. Again, bb86 is an interesting effort, and I was impressed when I examined the output of test.asm that was packaged with the code (seen in Sean's original comment).

4 thoughts on “Barrett’s Basic Blocks Are Back”

sean barrett November 1, 2007 at 8:04 am

The cl bug is a partial register thing; the code treats ecx/cx/cl as totally separate registers, so if you do something like ‘xor ecx,ecx;mov cl;…ecx…’ it’ll get it wrong… or in this case loading ecx and using cl. It probably wouldn’t be hard to fix either of those cases in a naive way (e.g. by just treating them as total aliases of each other).

Doing it symbolically would be kind of a pain; everything you wrote to FOO to ecx you’d have to write ‘(FOO & 0xffff)’ to cx, ‘(FOO & 0xff)’ to cl, and ‘((FOO >> 8) & 0xff)’ to ch. A “real” decompiler would definitely want to get this stuff right, but for something quick and dirty, treating them as aliases of each other probably catches most of the cases. (It would certainly handle this one.)

As to the other, I’m not going to support bb86 in the long run, but if it’ll make a difference between useful and not-userful, I’ll be happy to put in support for another asm format… I just need a sample.

sean barrett November 1, 2007 at 8:06 am

To clarify…

“youâ€™d have to write â€˜(FOO & 0xffff)â€™ to cx, â€˜(FOO & 0xff)â€™ to cl, and â€˜((FOO >> 8) & 0xff)â€™ to ch”

and then when you used these expressions, they’d propogate, and since the system is just tossing strings around and not simplifying them, you might end up with huge giant chains of redundant ((((FOO & 0xff) & 0xff) & 0xff) & 0xff) as the values. Which would be lame.

sean barrett November 1, 2007 at 8:13 am

Oh yeah (“just one more thing”)…

The easy thing you can do (although this is stupid in the long term) is just edit the asm so the shifts are by ecx instead of cl, and see what bb86 pops out.

Multimedia Mike Post authorNovember 1, 2007 at 2:48 pm

Indeed, those segmented registers (cl/ch/cx/ecx) are a huge headache when doing RE-type stuff on x86. It makes me wonder if the same experiments on more refined RISC architectures would be simpler. It’s hard to say exactly since most of the interesting stuff to RE comes in x86 form.

Comments are closed.