Even though I have been studying and working on multimedia technology since 2000 — reverse engineering, documenting, and reimplementing a variety of audio and video codecs — I didn’t actually begin to understand why various algorithms achieved their compression until about 2003. I’m just like that — I study the practice first, and then the underlying theory eventually becomes clear to me (maybe; it has been 9 years and I still couldn’t explain everything about the discrete cosine transform if you asked).
I happened to be looking back over the ZMBV (DOSBox) video codec today. The last time I looked at ZMBV, I thought of a slightly better method for computing error between 2 blocks. My method involved summing the number of pixels that differed between 2 blocks. Michael promptly did me one better by implementing a “0th-order entropy approximator”. When I took a closer look at the method today, I thought it was functionally identical to my algorithm. It is XOR’ing 2 vectors of pixels, maintaining a histogram of the results (i.e., the number of pixel pairs whose bitwise XOR was 1, 2, etc.), then summing the numbers in the histogram array.
Except not quite. A closer examination of the summation revealed “sum+= score_tab[histogram[i]]”, and it occurs on elements 1..255. What is score_tab? Why, it’s:
/* for i from 1..255 */ score_tab[i]= -i * log(i/(double)(ZMBV_BLOCK*ZMBV_BLOCK)) * (256/M_LN2);
(ZMBV_BLOCK = 16, M_LN2 = ln(2)).
I want pictures. Pictures help me a lot. I don’t do a lot of heavy duty math lifting and am not up to date on modern tools. But I figured that in this day and age, there must be plenty of graphing tools on the web. This is the top Google hit for “graph a function” and it offers the following graph:
First of all, Michael’s code explained: he used table to estimate entropy (well, base term from the Shannon’s theory of information) of the XOR’ed block. In other words, the higher count of the same symbols, the lower entropy and the better compression of the block. 0-th order means he didn’t use context to estimate entropy.
I think I might be starting to understand. My method was predicated on trying to generate an error vector that had as many 0s as possible grouped into strings of 8 bits. I.e., I was trying to maximize 0 bytes without regard to what the non-zero bytes contained. Michael’s method is also concerned with maximizing 0 bytes but also cares about what is happening in those non-zero bytes. That graph indicates that the method prefers values at the top and bottom of the unsigned 8-bit range. Why would that be? I suspect that it may have something to do with the fact that the higher and lower values have longer strings of repeating symbols (e.g., 1 = 00000001; 255 = 11111111).
Or I might be completely wrong.
Sorry if this is kindergarten-level stuff to some of you readers. But you know I like to use this space as a personal technical journal for working out these lessons.