Check out this sequence:
- copy 32-bit register A to register B
- shift B
leftright by 0x1F (thanks to Reimar for spotting the mistake) - subtract 1 from B
- logically AND B against A
Eventually, it dawned on me that the sequence saturates an integer to a minimum value of 0, only without using any of the more traditional branching logic for such an operation.
Speed demons, these programmers. They used some neat tricks. It looks like gcc didn’t, though. Either that or I don’t understand the deeper meaning of the instruction “lea esi, [esi+0]”. It strikes me as a NOP. But that’s not quite as bad as some code observed in another gcc-compiled module recently that saw fit to execute “mov eax, eax” after a function call before moving eax (the function’s return value) to its final destination.
I’m not complaining since, when the time comes (hopefully soon) to reverse engineer that module, the naive compilation will make the task more straightforward.
Well, NOP is only one byte and sometimes compiler inserts other meaningless instructions (like lea eax,[eax+0],xchg eax,eax or mov eax, eax) to fill the space and keep loops aligned. Also consider total execution time per byte (e.g. one two-byte instruction versus two one-byte NOPs).
It’s parallelizing that gives me headache sometimes (i.e. when one calculation is interleaved with another to be more parallelizable and to negate the execution latency).
You can find very similar tricks in these pages:
http://graphics.stanford.edu/~seander/bithacks.html
http://www.inwap.com/pdp10/hbaker/hakmem/hakmem.html
Hm… Did you mean “shift B _right_ by 0×1F”, because otherwise it makes no sense to me…
Oops, yes I did mean shift right… I’ll fix.
Why copy, shift, subtract, and and? Why not just xor A, A or something similar?
Where is the traditional branching in setting a register to 0?
@Anonymous: The traditional branching logic is:
if (register < 0)
register = 0
The less orthodox sequence works because if the top/sign bit is set, it will decrement to 0 when shifted to the LSB and the AND operation will clear the entire register. Hence:
if (sign bit is set)
clear register
Whereas, if the sign bit is clear to begin with, shifting to the LSB will result in a 0 register, and decrementing will flip the register to all ones. ANDing will result in the original value.
The book at the fxt site also has some really interesting stuff on the topic of optimization.
http://www.jjj.de/fxt/#fxtbook
In case you are interested, i tested this code with a C implementation.
int break_eggs(const int input)
{
int branch = input;
/* branch with signed int */
if ( branch >= 0x1f;
branchless -= 1;
branchless &= (unsigned int)input;
assert( (unsigned int)branch == branchless );
return branchless;
}
void main() {
assert (break_eggs(-15) == 0);
assert (break_eggs(15) == 15);
}
The code was mangled by the html :(
In case you are interested, i tested this code with a C implementation.
int break_eggs(const int input)
{
int branch = input;
/* branch with signed int */
if ( branch >= 0x1f;
branchless -= 1;
branchless &= (unsigned int)input;
assert( (unsigned int)branch == branchless );
return branchless;
}
void main() {
assert (break_eggs(-15) == 0);
assert (break_eggs(15) == 15);
}
Am I missing something ?
negative :
b = b>>31 = 0xFFFFFFFF = -1
b = b-1 = 0xFFFFFFFE = -2 ????
positive case :
b = b>>31 = 0
b = b-1 = -1 = 0xFFFFFFFF OK
I think that this could work better :
b>>=31
b= ~b
b&=a
@Anonymous: This assumes a logical shift vs. an arithmetic shift. Thus, zeros are shifted in from the left, even for a negative number.