Tour of Part of the VP8 Process

My toy VP8 encoder outputs a lot of textual data to illustrate exactly what it’s doing. For those who may not be exactly clear on how this or related algorithms operate, this may prove illuminating.

Let’s look at subblock 0 of macroblock 0 of a luma plane:

 subblock 0 (original)
  92  91  89  86
  91  90  88  86
  89  89  89  88
  89  87  88  93

Since it’s in the top-left corner of the image to be encoded, the phantom samples above and to the left are implicitly 128 for the purpose of intra prediction (in the VP8 algorithm).

 subblock 0 (original)
     128 128 128 128
 128  92  91  89  86
 128  91  90  88  86
 128  89  89  89  88
 128  89  87  88  93


Using the 4×4 DC prediction mode means averaging the 4 top predictors and 4 left predictors. So, the predictor is 128. Subtract this from each element of the subblock:

 subblock 0, predictor removed
 -36 -37 -39 -42
 -37 -38 -40 -42
 -39 -39 -39 -40
 -39 -41 -40 -35

Next, run the subblock through the forward transform:

 subblock 0, transformed
 -312   7   1   0
    1  12  -5   2
    2  -3   3  -1
    1   0  -2   1

Quantize (integer divide) each element; the DC (first element) and AC (rest of the elements) quantizers are both 4:

 subblock 0, quantized
 -78   1   0   0
   0   3  -1   0
   0   0   0   0
   0   0   0   0

The above block contains the coefficients that are actually transmitted (zigzagged and entropy-encoded) through the bitstream and decoded on the other end.

The decoding process looks something like this– after the same coefficients are decoded and rearranged, they are dequantized (multiplied) by the original quantizers:

 subblock 0, dequantized
 -312   4   0   0
    0  12  -4   0
    0   0   0   0
    0   0   0   0

Note that these coefficients are not exactly the same as the original, pre-quantized coefficients. This is a large part of where the “lossy” in “lossy video compression” comes from.

Next, the decoder generates a base predictor subblock. In this case, it’s all 128 (DC prediction for top-left subblock):

 subblock 0, predictor
  128 128 128 128
  128 128 128 128
  128 128 128 128
  128 128 128 128

Finally, the dequantized coefficients are shoved through the inverse transform and added to the base predictor block:

 subblock 0, reconstructed
  91  91  89  85
  90  90  89  87
  89  88  89  90
  88  88  89  92

Again, not exactly the same as the original block, but an incredible facsimile thereof.

Note that this decoding-after-encoding demonstration is not merely pedagogical– the encoder has to decode the subblock because the encoding of successive subblocks may depend on this subblock. The encoder can’t rely on the original representation of the subblock because the decoder won’t have that– it will have the reconstructed block.

For example, here’s the next subblock:

 subblock 1 (original)
  84  84  87  90
  85  85  86  93
  86  83  83  89
  91  85  84  87

Let’s assume DC prediction once more. The 4 top predictors are still all 128 since this subblock lies along the top row. However, the 4 left predictors are the right edge of the subblock reconstructed in the previous example:

 subblock 1 (original)
    128 128 128 128
 85  84  84  87  90
 87  85  85  86  93
 90  86  83  83  89
 92  91  85  84  87

The DC predictor is computed as (128 + 128 + 128 + 128 + 85 + 87 + 90 + 92 + 4) / 8 = 108 (the extra +4 is for rounding considerations). (Note that in this case, using the original subblock’s right edge would also have resulted in 108, but that’s beside the point.)

Continuing through the same process as in subblock 0:

 subblock 1, predictor removed
 -24 -24 -21 -18
 -23 -23 -22 -15
 -22 -25 -25 -19
 -17 -23 -24 -21

 subblock 1, transformed
 -173  -9  14  -1
    2 -11  -4   0
    1   6  -2   3
   -5   1   0   1

 subblock 1, quantized
 -43  -2   3   0
   0  -2  -1   0
   0   1   0   0
  -1   0   0   0

 subblock 1, dequantized
 -172  -8  12   0
    0  -8  -4   0
    0   4   0   0
   -4   0   0   0

 subblock 1, predictor
  108 108 108 108
  108 108 108 108
  108 108 108 108
  108 108 108 108

 subblock 1, reconstructed
  84  84  87  89
  86  85  87  91
  86  83  84  89
  90  85  84  88

I hope this concrete example (straight from a working codec) clarifies this part of the VP8 process.

5 thoughts on “Tour of Part of the VP8 Process

  1. Reimar

    Does VP8 really use the average of both blocks for DC prediction? Other codecs chose one specific block and take the DC prediction of that…
    I mean, the way you do it DC predicition would work horribly badly if e.g. the video started with an all-black first 16 rows.

  2. Tim

    Nice post. Reimar: There are several prediction modes, including ones that ignore either the top or left ‘phantom samples’ (is that the official name?).

  3. Multimedia Mike Post author

    @Tim: Phantom samples is my name for it. :-) My encoder maintains a macroblock data structure named phantom_mb that manages the out of frame stuff.

    @Reimar: Yes, VP8 always averages in the phantom samples, as opposed to H.264 which will omit the phantom samples.

  4. Reimar

    Not completely sure about H.264 but for MPEG-4 it’s not really that it omits it but that it runs what you could call an extremely basic edge-detection filter on the 3 surrounding DC value and then picks the one that is most likely on the same size as the current MB.
    In the case of the border pixels this indeed ends up discarding those outside the frame.

  5. Reimar

    s/size/side/
    And the algorithm is to pick the one out of left and top that differs more from the top-left one.
    Which for any sharp mostly horizontal or mostly vertical lines certainly gives a much better predictor than averaging (of course there are also cases where it does much worse).

Comments are closed.