Category Archives: Reverse Engineering

Brainstorming and case studies relating to craft of software reverse engineering.

4 Additions Are Faster Than 1?

Today’s wacky ASM construct shows us four sequential additions to the same register:

  add ecx, 00000004
  add ecx, 00000004
  add ecx, 00000004
  add ecx, 00000004

Does this have some great advantage over, say:

  add ecx, 00000010

Who knows? But I dare you to claim that this is some compiler optimization technique. You can’t claim that it’s minimizing branch prediction error this time since there are no branches to be seen for miles (kilometers, if you must) on either side. In fact, here is a little ASM context on either side of the block:

  xor ebx, ebx
  mov bl, byte[esp+26]
  add ecx, 00000004
  add ecx, 00000004
  add ecx, 00000004
  add ecx, 00000004
  mov ebx, dword[esi+4*ebx]

Hmm, now that I look at it after an indignant rant, I realize that maybe the compiler decided to delay after a byte is loaded from memory before accessing memory again.

Humongous CUPs

I will never, ever run out of multimedia formats to study. The ScummVM folks helped drive home that point by providing me with samples of another FMV format named CUP. These files start with the signature ‘BEAN’. They are to be played with the program coffee.exe. Get it? Good. Moving along, CUP files were made as demo movies for games by Humongous Entertainment that apparently started out as a splinter faction from LucasArts but is now a subsidiary of Atari Kids games.


Humongous Entertainment Logo

There are only 6 samples known to exist (and you can get them from the usual place). Yet, the whole format strikes me as an unbelievable mess. Maybe I’m just frustrated because I can’t seem to make a really simple standalone parser for the format to make sure I have caught every bizarre FourCC tag that they saw fit to stuff into the format. This kind of file format is, of course, nothing new to the seasoned, or even amateur, multimedia hacker– just a bunch of data chunks that start with FourCCs. The chunks can be nested but that’s nothing new. I think the most frustrating feature is that the DATA chunk in these files can either be a leaf chunk or encapsulate other chunks. This is a departure from the typical wisdom that specific chunk types shall either define their own data or shall encapsulate other chunks, but not both. And don’t even get me started on the format’s reckless mixture of big endian and little endian numbers.

Palettized video is stored in 1 of 2 formats– either an RLE format or a custom LZ-derived format that I am calling “tri-lz” because of the way the encoded stream is stored in 3 pieces, and because the original program seems to refer to the format by a similar name. Audio is obviously uncompressed, 8-bit, unsigned PCM. But it seems that the audio data is all stored at the front of the file, before any of the video data.

You can follow my progress on the MultimediaWiki page for CUP, or add your own data if you can figure out anything from the samples and binaries in the archive. Having completed the cursory Wiki description, I can see that it might be possible to implement a reasonable demuxer for the format, just not an incredibly naive recursive demuxer.

Bizarre ASM Construct Of The Day

Check out this piece of x86 ASM arcana:

  lea   edx, [edx+1]

What on earth? This appears to be functionally equivalent to:

  inc   edx

So, what, was the compiler/assembler or possibly the original coder just trying to show off with a single overachieving x86 instruction like lea? Actually, a closer analysis of the surrounding ASM instructions may reveal what is happening here:

  cmp   ebx, value
  mov   al, [edx]
  lea   edx, [edx+1]
  mov   [edi], al
  lea   edi, [edi+1]
  jz    address

The conditional branch at the end of the block depends on the flags set by the comparison at the start. Per my understanding, neither mov nor lea modify flags but inc probably would (I can never find a good x86 reference– that includes flag data– when I need one). Why not perform the comparison just before the conditional branch? Mine is not to question why. But I imagine that someone will comment that this is an obscure optimization trick for original Pentium machines or some such.