I hope my previous walkthrough of the VP8 4×4 intra coding process was educational. Today, I’ll be walking through an example of what happens when my toy VP8 encoder encodes an intra 16×16 block. This may prove educational to those who have never been exposed to the deep details of this or related algorithms. Also, I wanted to illustrate where I think my VP8 encoder process is going bad and generating such grotesque results.

Before I start, let me give a shout-out to Google Docs’ Drawing tool which I used to generate these diagrams. It works quite well.

**Results**

*(Always cut to the chase in a blog post; results first.)* I’m glad I composed this post. In the course of doing so, **I found the problem, fixed it**, and am now able to present this image that was decoded from the bitstream encoded by my ~~toy~~ *working* VP8 encoder:

Yeah, I know that image doesn’t look like anything you haven’t seen before. The difference is that it has made a successful trip through my VP8 encoder.

Follow along through the encoding process and learn of the mistake…

**Original Block and Subblocks**

Here is the 16×16 block to be encoded:

The block is broken down into 16 4×4 subblocks for further encoding:

**Prediction**

The first step is to pick a prediction mode, generate a prediction block, and subtract the predictors from the macroblock. In this case, we will use DC prediction which means the predictor will be the same for each element.

In 4×4 VP8 DC intra prediction, samples outside of the frame are assumed to be 128. It’s a little different in 16×16 DC intra prediction– samples above the top row are assumed to be 127 while samples left of the leftmost column are assumed to be 129. For the top left macroblock, this still works out to 128.

Subtract 128 from each of the samples:

**Forward Transform**

Run each of the 16 prediction-removed subblocks through the forward transform. This example uses the forward transform from libvpx 0.9.5:

I have highlighted the DC coefficients in each subblock. That’s because those receive special consideration in 16×16 intra coding.

**Quantization**

The Y plane AC quantizer is 4 in this example, the minimum allowed. (The Y plane DC quantizer is also 4 but doesn’t come into play for intra 16×16 coding since the DC coefficients follow a different process.) Thus, quantize (integer divide) each AC element in each subblock (we’ll ignore the DC coefficient for this part):

**The Y2 Round Trip**

Those highlighted DC coefficients from each of the 16 subblocks comprise the Y2 block. This block is transformed with a slightly different algorithm called the Walsh-Hadamard Transform (WHT). The results of this transform are then quantized (using 8 for both Y2 DC and AC in this example, as those are the smallest Y2 quantizers that VP8 allows), then zigzagged and entropy-coded along with the rest of the macroblock coefficients.

On the decoder side, the Y2 coefficients are decoded, de-zigzagged, dequantized and run through the inverse WHT.

**And this is where I suspect that most of the error is creeping into my VP8 encoder.** Observe the round-trip through the Y2 process:

As intimated, this part causes me consternation due to the wide discrepancy between the original and the reconstructed Y2 blocks. Observe the absolute difference between the 2 vectors:

That’s really significant and leads me to believe that this is where the big problem is.

**What’s Wrong?**

My first suspicion is that the quantization is throwing off the process. I was disabused of this idea when I removed quantization from the equation and immediately reversed the transform:

So perhaps there is a problem with the forward WHT. Just like with the usual subblock transform, the VP8 spec doesn’t define how to perform the forward WHT, only the inverse WHT. Do I need to audition different forward WHTs from various versions of libvpx, similar to what I did with the other transform? That doesn’t make a lot of sense– libvpx doesn’t seem to have so much trouble with basic encoding.

**The Punchline**

I reviewed the forward WHT code, the stuff that I plagiarized from libvpx 0.9.0. The function takes, among other parameters, a pitch value. There are 2 loops in the code. The first iterates through the rows of the input matrix– which I assumed was a 4×4 matrix. I was puzzled that during each iteration of the row loop, the input pointer was only being advanced by `(pitch/2)`

. I removed the division by 2 and the problem went away. *I.e., the encoded image looks correct.*

What’s up with the `(pitch/2)`

, anyway? It seems that the encoder likes to pack 2 4×4 subblocks into an 8×4 block data structure. In fact, the forward DCTs in the libvpx encoder have the same artifact. Remember how I surveyed several variations of forward DCT from different versions of libvpx? The one that proved most accurate in that test was the one I had already modified to advance the input pointer properly. Fixing the other 2 candidates yields similar results:

input: 92 91 89 86 91 90 88 86 89 89 89 88 89 87 88 93 short 0.9.0: -311 6 2 0 0 11 -6 1 2 -3 3 0 0 0 -2 1 inverse: 92 91 89 86 91 90 88 87 90 89 89 88 89 87 88 93 fast 0.9.0: -313 5 1 0 1 11 -6 1 3 -3 4 0 0 0 -2 1 inverse: 91 91 89 86 90 90 88 86 89 89 89 88 89 87 88 93 short 0.9.5: -312 7 1 0 1 12 -5 2 2 -3 3 -1 1 0 -2 1 inverse: 92 91 89 86 91 90 88 86 89 89 89 88 89 87 88 93

*Code cribber beware!*

**Corrected Y2 Round Trip**

Let’s look at that Y2 round trip one more time:

And another look at the error between the original and the reconstruction:

*Better.*

**Dequantization, Prediction, Inverse Transforms, and Reconstruction**

To be honest, now that I solved the major problem, I’m getting a little tired of making these pictures. Long story short, all elements of the original 16 subblocks are dequantized and their DC coefficients are filled in with the appropriate item from the reconstructed Y2 block. A base predictor block is generated (all 128 values in this case). And each Y block is run through the inverse transform and added to the predictor block. The following is the reconstruction:

And if you compare that against the original luma macroblock (I don’t feel like doing it right now), you’ll find that it’s pretty close.

I can’t believe how close I was all this time, and how long that pitch bug held me up.

compndid you submit some libvpx code comments to google?

or are you going to write another spec? hehe

Robert SwainLooking much better now. Good job! Next up – implementing all compression features and optimising the encoder so that it beats libvpx and is multi-threaded? ;o

Multimedia MikePost author@compn: I just might have to submit revised (or any) comments to the libvpx codebase.

I don’t plan to write a new spec since the current spec authors are amenable to spec revisions (if I would ever get around to submitting any).

@Robert: Next, I would like to get 4×4 intra mode working completely (part of the way on the infrastructure right now). After that, I could work on intraframe compression quality. Or, if I wanted to delay hard work, I was thinking about encoding the coefficients in multiple segments so that a decoder could leverage multiple cores during entropy decoding.

Jim LeonardExcellent post! While I’m glad you have a working encoder now, I will miss the humorous results from the prior attempts. Every time I saw a new result I would laugh out loud (and recognize a part of what was wrong).

Multimedia MikePost author@Jim: Oh, I’m sure there’s more comedy to come given the extraordinary complexity of the motion compensation. And just wait and see what happens when I try my hand at rate control!

It’s still a great deal of fun. I’m learning a lot for the first time in a long time.

gabrielHi Mike!

I follow you blog since a few months know and itÂ´s getting better bit by bit.

IÂ´m planning to start learning about codecs and this itÂ´s going to be for sure an very valuable resource.

Many thanks :)

charan kumarHi Mike,

I need your help for my project, can you clearly explain how quantization is working in vp8 and in source code witch file contains the quantization ……

please, i need urgently ……

Multimedia MikePost authorQuantization is basically integer division. Dequantization is basically multiplication. I’m not sure which source file this logic lives in over in the official libvpx codebase.

Remember, the bitstream stores indices into quantizer tables. There are 6 quantizers, normally– Y.DC, Y.AC, C.DC, C.AC, Y2.DC, and Y2.AC. Actually, there might be segmentation enabled in which case there will be 4 sets of 6 quantizers.

Your specific questions might be better suited for the webm development list.

charan kumarpleas explain how to select Quantization parameter in any one of the above 6-quantizers….??

KeyurHi Mike, your post was extremely helpful, I am myself doing a project at graduate level of comparison of H.264 and VP8, your explanation proved helpful in that context. I want to ask you a question, can you guide me to use ffmpeg for VP8, I want to know basic commandline to encode a raw video sequence to .webm format varying quantization.