# Tour of Part of the VP8 Process

November 17th, 2010 by Multimedia Mike

My toy VP8 encoder outputs a lot of textual data to illustrate exactly what it’s doing. For those who may not be exactly clear on how this or related algorithms operate, this may prove illuminating.

Let’s look at subblock 0 of macroblock 0 of a luma plane:

``` subblock 0 (original)
92  91  89  86
91  90  88  86
89  89  89  88
89  87  88  93
```

Since it’s in the top-left corner of the image to be encoded, the phantom samples above and to the left are implicitly 128 for the purpose of intra prediction (in the VP8 algorithm).

``` subblock 0 (original)
128 128 128 128
128  92  91  89  86
128  91  90  88  86
128  89  89  89  88
128  89  87  88  93
```

Using the 4×4 DC prediction mode means averaging the 4 top predictors and 4 left predictors. So, the predictor is 128. Subtract this from each element of the subblock:

``` subblock 0, predictor removed
-36 -37 -39 -42
-37 -38 -40 -42
-39 -39 -39 -40
-39 -41 -40 -35
```

Next, run the subblock through the forward transform:

``` subblock 0, transformed
-312   7   1   0
1  12  -5   2
2  -3   3  -1
1   0  -2   1
```

Quantize (integer divide) each element; the DC (first element) and AC (rest of the elements) quantizers are both 4:

``` subblock 0, quantized
-78   1   0   0
0   3  -1   0
0   0   0   0
0   0   0   0
```

The above block contains the coefficients that are actually transmitted (zigzagged and entropy-encoded) through the bitstream and decoded on the other end.

The decoding process looks something like this– after the same coefficients are decoded and rearranged, they are dequantized (multiplied) by the original quantizers:

``` subblock 0, dequantized
-312   4   0   0
0  12  -4   0
0   0   0   0
0   0   0   0
```

Note that these coefficients are not exactly the same as the original, pre-quantized coefficients. This is a large part of where the “lossy” in “lossy video compression” comes from.

Next, the decoder generates a base predictor subblock. In this case, it’s all 128 (DC prediction for top-left subblock):

``` subblock 0, predictor
128 128 128 128
128 128 128 128
128 128 128 128
128 128 128 128
```

Finally, the dequantized coefficients are shoved through the inverse transform and added to the base predictor block:

``` subblock 0, reconstructed
91  91  89  85
90  90  89  87
89  88  89  90
88  88  89  92
```

Again, not exactly the same as the original block, but an incredible facsimile thereof.

Note that this decoding-after-encoding demonstration is not merely pedagogical– the encoder has to decode the subblock because the encoding of successive subblocks may depend on this subblock. The encoder can’t rely on the original representation of the subblock because the decoder won’t have that– it will have the reconstructed block.

For example, here’s the next subblock:

``` subblock 1 (original)
84  84  87  90
85  85  86  93
86  83  83  89
91  85  84  87
```

Let’s assume DC prediction once more. The 4 top predictors are still all 128 since this subblock lies along the top row. However, the 4 left predictors are the right edge of the subblock reconstructed in the previous example:

``` subblock 1 (original)
128 128 128 128
85  84  84  87  90
87  85  85  86  93
90  86  83  83  89
92  91  85  84  87
```

The DC predictor is computed as `(128 + 128 + 128 + 128 + 85 + 87 + 90 + 92 + 4) / 8 = 108` (the extra +4 is for rounding considerations). (Note that in this case, using the original subblock’s right edge would also have resulted in 108, but that’s beside the point.)

Continuing through the same process as in subblock 0:

``` subblock 1, predictor removed
-24 -24 -21 -18
-23 -23 -22 -15
-22 -25 -25 -19
-17 -23 -24 -21

subblock 1, transformed
-173  -9  14  -1
2 -11  -4   0
1   6  -2   3
-5   1   0   1

subblock 1, quantized
-43  -2   3   0
0  -2  -1   0
0   1   0   0
-1   0   0   0

subblock 1, dequantized
-172  -8  12   0
0  -8  -4   0
0   4   0   0
-4   0   0   0

subblock 1, predictor
108 108 108 108
108 108 108 108
108 108 108 108
108 108 108 108

subblock 1, reconstructed
84  84  87  89
86  85  87  91
86  83  84  89
90  85  84  88
```

I hope this concrete example (straight from a working codec) clarifies this part of the VP8 process.

Posted in VP8 | 5 Comments »

## 5 Responses

1. Reimar Says:

Does VP8 really use the average of both blocks for DC prediction? Other codecs chose one specific block and take the DC prediction of that…
I mean, the way you do it DC predicition would work horribly badly if e.g. the video started with an all-black first 16 rows.

2. Tim Says:

Nice post. Reimar: There are several prediction modes, including ones that ignore either the top or left ‘phantom samples’ (is that the official name?).

3. Multimedia Mike Says:

@Tim: Phantom samples is my name for it. :-) My encoder maintains a macroblock data structure named phantom_mb that manages the out of frame stuff.

@Reimar: Yes, VP8 always averages in the phantom samples, as opposed to H.264 which will omit the phantom samples.

4. Reimar Says:

Not completely sure about H.264 but for MPEG-4 it’s not really that it omits it but that it runs what you could call an extremely basic edge-detection filter on the 3 surrounding DC value and then picks the one that is most likely on the same size as the current MB.
In the case of the border pixels this indeed ends up discarding those outside the frame.

5. Reimar Says:

s/size/side/
And the algorithm is to pick the one out of left and top that differs more from the top-left one.
Which for any sharp mostly horizontal or mostly vertical lines certainly gives a much better predictor than averaging (of course there are also cases where it does much worse).