In lieu of recent Pickover puzzles (I have an impressive backlog of those to work on), VAG sent in this curious x86 ASM nugget:
xchg bh, bl mov di, bx shr di, 1 shr di, 1 add di, bx xchg bh, bl add di, dx
The hint that VAG gives is that it has something to do with a common multimedia task. Try to figure it out.
It calculates the offset in a 320*200 buffer of the given x,y coodinate.
di = bx * 320 + dx
Since bx can only be in the range of 0..199 (i.e. only bl) the code uses this to its advantage.
Though I must admit it was a bit unfair not to tell us if it was AT&T or Intel syntax, that always costs me an extra scratching-my-head moment :-)
I wonder for what CPU that was written though, 286? Because IIRC “shr di, 2” would have been possible and faster from 386 on…
Reimar: You should usually be able to sort out the syntax ordering by looking for an immediate value, as in the “shr” instruction, since it would be pretty useless to shift an immediate value.
I’m confused by those sequential shifts as well. I can only assume it was some pipeline optimization measure. Except that this code may have predated such considerations.
Reimar: This should be pretty obvious. Though, AT&T syntax use % and $ symbols to indicate registers/constants.
Anyway. Yes, it’s a legacy 286 code. Just an explanation how it works.
Generally, we can express 320 in power-of-two values as 256 + 64. Multiply by 256 can be performed by shifting value left 8 times, by 64 as shifting left 6 times. However, instead of straight multiply by 64 we can reuse multiplication by 256 and divide it by 4 – shift right two times. xchg basically acts as shl bx, 8 in this sample (as bh is always zero). That’s how it works. No ideas if it’s really faster than direct table lookup. :)