4 Additions Are Faster Than 1? | Breaking Eggs And Making Omelettes

Today’s wacky ASM construct shows us four sequential additions to the same register:

  add ecx, 00000004
  add ecx, 00000004
  add ecx, 00000004
  add ecx, 00000004

Does this have some great advantage over, say:

  add ecx, 00000010

Who knows? But I dare you to claim that this is some compiler optimization technique. You can’t claim that it’s minimizing branch prediction error this time since there are no branches to be seen for miles (kilometers, if you must) on either side. In fact, here is a little ASM context on either side of the block:

  xor ebx, ebx
  mov bl, byte[esp+26]
  add ecx, 00000004
  add ecx, 00000004
  add ecx, 00000004
  add ecx, 00000004
  mov ebx, dword[esi+4*ebx]

Hmm, now that I look at it after an indignant rant, I realize that maybe the compiler decided to delay after a byte is loaded from memory before accessing memory again.

One thought on “4 Additions Are Faster Than 1?”

Tinic Uro April 5, 2006 at 12:41 pm

Is the ‘mov ebx, dword[esi+4*ebx]’ the beginning of a loop? In this case it could have been done to align this instruction to a 16 byte address boundary, although most compilers would probably pick no-op instructions to get this effect. Or it could be to clear the UV pipelines before that loop starts.

In x86 assembly you can obtain extra instructions if you do this:

…
align 16
loop:
mov eax, [esi]
…
jnz loop

Notice the ‘align 16’ which could insert no-ops (usually something like ‘lea eax, [eax+0]’ or ‘mov eax,eax’). But as I said it would probably not pick 4 add instructions.

Comments are closed.