Today’s wacky ASM construct shows us four sequential additions to the same register:
add ecx, 00000004 add ecx, 00000004 add ecx, 00000004 add ecx, 00000004
Does this have some great advantage over, say:
add ecx, 00000010
Who knows? But I dare you to claim that this is some compiler optimization technique. You can’t claim that it’s minimizing branch prediction error this time since there are no branches to be seen for miles (kilometers, if you must) on either side. In fact, here is a little ASM context on either side of the block:
xor ebx, ebx mov bl, byte[esp+26] add ecx, 00000004 add ecx, 00000004 add ecx, 00000004 add ecx, 00000004 mov ebx, dword[esi+4*ebx]
Hmm, now that I look at it after an indignant rant, I realize that maybe the compiler decided to delay after a byte is loaded from memory before accessing memory again.
Is the ‘mov ebx, dword[esi+4*ebx]’ the beginning of a loop? In this case it could have been done to align this instruction to a 16 byte address boundary, although most compilers would probably pick no-op instructions to get this effect. Or it could be to clear the UV pipelines before that loop starts.
In x86 assembly you can obtain extra instructions if you do this:
…
align 16
loop:
mov eax, [esi]
…
jnz loop
Notice the ‘align 16’ which could insert no-ops (usually something like ‘lea eax, [eax+0]’ or ‘mov eax,eax’). But as I said it would probably not pick 4 add instructions.