Category Archives: VP3/Theora

Theora Superblock Traversal

Hands down, one of the hardest parts about the VP3/Theora coding method is the bizarre fragment traversal pattern. Most everything else that VP3 does is seen in other codecs. But the Hilbert pattern used for encoding and decoding coefficients creates a monster. Vector drawing program to the rescue!

Imagine a video frame with a resolution of 96×48, highly contrived for this example. The Y plane would consist of 12×6 fragments (8×8 block of samples). Meanwhile, the U and V planes would each consist of 6×3 fragments. Recall that VP3 employs a notion of a superblock, which encompasses a group of 4×4 fragments. If the fragment resolution of a plane is not divisible by 4, the plane must be conceptually padded out to the nearest superblock boundary for the traversal pattern.


click on the image for the full sized image
Theora superblock decoding pattern

The above drawing illustrates the order that the fragments are traversed in our hypothetical 96×48 frame when decoding coefficients. Note, in particular, the strange order in which fragments 50..53 are traversed.

I hope that’s clear enough. The challenge at hand is to establish data structures at the start of the encode or decode process. Generally, you will allocate an array of fragment data structures: Fragment indices 0, 1, 2, 3, etc. will proceed left -> right, then top to bottom. The second row of the Y plane begins at index 12, third row at index 24, and so on. But when it comes to decoding the DCT coefficients, it is necessary to traverse through the fragment data structure array at indices 0, 1, 13, 12, 24, 36, 37, 25, and so on. Thus, it is customary to build an array that maps one set of indices to the other. I’m having flashbacks just thinking about it. I remember developing a completely different algorithm than the official VP3 decoder when I reimplemented the decoder for FFmpeg.

Oh yeah, and remember that all VP3/Theora decoding is performed upside-down. I choose not to illustrate that fact in these drawings since this stuff is complicated enough.

How Many Multiplications?

While many are still dogged by the quandary of how many licks it takes to get to the chewy center of a certain popular confection, I found myself wondering how many multiplication operations it takes to decode a frame of certain video codecs. Multiplication tends to be one of the more intensive parts of video decoding and much effort is spent minimizing the number of multiplications that are absolutely necessary. If my numbers and assumptions are correct, a 320×240 VP3/Theora intraframe requires 747,680 multiplications to decode. It goes something like this:

Each 8×8 block of the image needs to be transformed and dequantized. How many blocks are in the image?

 Y plane: (320 * 240) pixels / (8 * 8) pixels/block = 1200 blocks
 U plane: (160 * 120) pixels / (8 * 8) pixels/block =  300 blocks
 V plane: (160 * 120) pixels / (8 * 8) pixels/block =  300 blocks

So there are 1800 blocks in the frame. Each block needs to be run through an inverse discrete cosine transform (IDCT). Each of the 64 samples requires 64 mults. That means each block requires 642 mults. However, optimizations (usually by factoring out redundant mults) result in a transform that can be performed with N*log2(N) mults.

  64 * log2(64) = 384 mults/block

This assumes that all of the blocks in the intraframe have enough non-zero coefficients that more optimal IDCTs (like 1-, 3-, and 10-element) are not warranted. Then there is the matter of dequantization. Assuming an average of 16 non-zero coefficients in need of dequantization, the number of mults per block rises to 400.

 400 mults/block * 1800 blocks/frame = 720000 mults/frame

The same number would hold true for JPEG images or MPEG-1 intraframes, given the same (possibly fraudulent) assumptions. I am not sure if 16 non-zero coefficients per block is too generous and too conservative an estimate. However, the vast majority of the blocks in a VP3/Theora intraframe will also require 3 multiplications in order to reverse the DC prediction. So I’ll just wave my hand and factor DC prediction in with the dequantization budget of 16 mults/block.

Now, if we want to deploy the recommended post processing method, that requires 8 more multiplications per edge, where ‘edge’ is a boundary between 2 blocks that have been rendered in this frame (and all blocks are rendered for an intraframe). How many edges are in the frame?

  • vertical edges = (width_in_blocks – 1) * (height_in_blocks)
  • horizontal edges = (width_in_blocks) * (height_in_blocks – 1)
 Y plane: 40x30 blocks, (40 - 1) * 30 + 40 * (30 - 1) = 1170 + 1160 = 2330 edges
 U plane: 20x15 blocks, (20 - 1) * 15 + 20 * (15 - 1) =  285 +  280 =  565 edges
 V plane: 20x15 blocks, (20 - 1) * 15 + 20 * (15 - 1) =  285 +  280 =  565 edges
 
 3460 edges/frame * 8 PP mults/edge = 27680 PP mults/frame

Thus:

 720000 mults/frame + 27680 mults/frame = 747680 mults/frame

It should be noted that VP3/Theora’s method for post-processing involves a 4-tap filter with coefficients of (1, -1, 3, -3). Trivially, the multiplication by 3 can be factored out (which is how the assumption of one mult per sample, or 8 per edge, is derived). Further, it can easily be broken down to a bit shift and addition (3*n = (n << 1) + n) so that the post processing phase does not strictly require any multiplications. Naturally, the parameters for a nominal VP3/Theora interframe vary wildly and it's a little harder to produce an average number of multiplications for that case. I'm sure someone will be along in the comments if I messed up the math or, more likely, the assumptions.

Superblock Corner Cases

Look who has been playing around some more with the vector drawing program. Here’s an illustration of somewhat limited utility but that still demonstrates an important point of the VP3/Theora coding scheme:


VP3/Theora superblock traversal corner cases

The image above depicts a hypothetical frame in the VP3/Theora coding scheme that has sample dimensions of 88×48. The valid 8×8 fragments are depicted in green. Since these do not line up nicely on 32-sample superblock boundaries, round up to the nearest superblock in either dimension. The green fragments inside the turquoise zone are the visible fragments. The grey fragments are phantoms that still must be accounted for in the overall superblock traversal pattern when coding/decoding the transform coefficients.

There is also the matter of what happens when the width and height of the frame do not line up on fragment boundaries (i.e., are not divisible by 8). The image is rounded up to the nearest fragment size for the purpose of transform coding.

The Legend Of Hilbert

I’ve been wanting to learn how to use a basic vector drawing program for some time now for the purpose of illustrating certain codec concepts more concretely. Sure, this will be for the benefit of others who are curious about the craft. But mostly, I do it for me because, well… me like pictures.

Behold, my first vector drawing, constructed using OpenOffice’s Draw program:


VP3/Theora Superblock Traversal Pattern

When I was first reverse engineering an English language description of the VP3 format and implementing a new decoder for FFmpeg, I figured out the curious pattern that the codec uses to traverse 4×4 fragments (blocks of 8×8 samples) within a VP3 superblock. I posted to the theora-dev mailing list asking if the pattern struck anyone as familiar. Personally, the pattern reminded me of playing the original NES The Legend of Zelda title, sort of like a pattern for traversing rooms in a dungeon. In fact, early iterations of my decoder used the identifier zelda[].

However, someone on the list identified it as resembling a Hilbert curve, discovered by some famous math dude. One of the codec’s designers chimed in on the list and stated that he had never even heard of Hilbert and that the traversal pattern was chosen to meet certain criteria. Any resemblance to the Hilbert curve was to be considered strictly coincidental.

Looking back on that old mailing list traffic, and taking a good look at the actual Hilbert curve from the link above, I may have made a mistake in using the term “Hilbert pattern” to describe the traversal sequence pictured above. It’s a little late now to change it back to “Zelda pattern”– Google demonstrates that the first term sort of caught on for VP3/Theora-related matters.