Actually, I need 80 optimal Huffman trees, but let’s take it one step at a time.

The VP3 video codec — the basis of Theora — employs 80 different Huffman trees. There are 16 for DC coefficients and 16 each for 4 different AC coefficient groups. An individual VP3/Theora frame gets to select 4 different Huffman trees: one for Y-plane DC, one for C-plane DC, one for Y-plane AC, and one for C-plane AC. VP3 hardcodes these tables. Theora allows more flexibility and an encoder is free to either use the default VP3 trees or create its own set and encode them into the header of the container (typical an Ogg file).

Generating an optimal Huffman tree for a particular set of input is rather well established; any introduction to Huffman codes covers that much. What I’m curious about is how one would go about creating a set of, e.g., 16 optimal Huffman trees for a given input. The first solution that comes to mind is to treat this as a vector quantization (VQ) problem. I have no idea if this idea holds water, or if it even has any sane basis in mathematics, but when has that ever stopped me from running with a brainstorm?

Here’s the pitch:

- Modify FFmpeg’s VP3/Theora decoder to print after each frame decode the count of each type of token that was decoded from the stream (for each of the 5 coefficient groups, and for each of the plane types), as well as the number of bits that token was encoded with. This will allow tallying of the actual number of bits used for encoding tokens in each frame.
- Create a separate tool to process the data by applying a basic VQ codebook training algorithm. It will be necessary to treat all of the Y-plane AC tokens as single vectors and do the same with the C-plane AC tokens, even though each AC token vector needs to be comprised of 4 separate AC group vectors. Re-use some existing E/LGB code for this step.
- Generate Huffman trees from the resulting vectors and count the number of bits per token for each.
- Iterate through the frequency vectors captured from the first step and match them to the codebooks using a standard distance algorithm.
- Tally the bits from using the new vectors and see if there is any improvement versus the default vectors (Huffman tables).

I don’t know if I’ll have time to get around to trying this experiment in the near future but I wanted to throw it out there anyway. With all of the improvments that the new Theora encoder brings to the tables, it seems that the custom Huffman trees feature is one that is left un-exercised per my reading of the documentation and source code. From studying the Big Buck Bunny Theora encodes (my standard Theora test vectors these days), I see that they use the default VP3 tables. The 1080p variant occupied 866 MB. Could there be any notable space savings from generating custom Huffman tables? Or was this a pointless feature to add to the Theora spec?

HellfredI have posted a link to your block entry to #theora irc chanel.

The response i got was:

SpeedsterF2: http://svn.xiph.org/trunk/theora-exp/examples/rehuff.c

Maybe it will help you getting started.

Hellfred

Timothy B. TerriberryThis was done over 2 years ago. What you describe is standard K-means clustering with the Kullback-Leibler divergence as the distance “metric”. Metric is in quotes since the KL divergence is not actually a metric, not being symmetric. However, you can prove that this is equivalent to measuring the total number of wasted bits due to the whole-bit restrictions of Huffman codes, and that the process converges (the number of wasted bits always goes down with every iteration, or at worst stays the same).

The code is in

http://svn.xiph.org/trunk/theora-exp/examples/rehuff.c

I’ve received reports that current svn has bitrotted, but r15592 should work. It generally gives a few % improvement (I just tested it with a fresh checkout and got 3.9% on a random file). However, the last time I tried to generate a new set of tables (prior to Thusnelda), they did not generalize well: the new tables performed slightly worse on sequences that were not included in the training set. We have more test sequences now to train on, and Thusnelda uses tokens in significantly different ways than the encoders back then did, so this may be worth revisiting.

Multimedia MikePost authorThanks for the data. I didn’t think libtheora would let this feature slide.

NB> From studying the Big Buck Bunny Theora encodes (my standard Theora test vectors these days), I see that they use the default VP3 tables.

These encodes were done back in early 2008, so they surely aren’t representative of modern Theora (one thing they sure don’t use is AQ; I’m not sure if they use custom tables).

Which brings me to a question – how do multiple selectable VLC tables interact with RD optimization? If you RD-optimize with certain bit costs, I’d expect the result to be biased enough to turn that VLC table into the optimal one, making multiple tables more or less useless.

Either you encode the whole frame a lot of times or you use some sort of heuristic to try to determine beforehand what table to use…

I know Thusnelda uses a greedy algorithm for a very small subset of this problem (macroclock mode encoding, where re-computing all the costs will all the tables every macroblock is feasible), but I don’t know if it tries to do something with DC/AC tables.

Multimedia MikePost authorI’m certain BBB-1080p uses the VP3 tables– I checked them manually.

Good questions, NB, about the RD optimization. Terriberry indicated in his comment that “Thusnelda uses tokens in significantly different ways” so perhaps you’re onto something about optimizing with certain bit cost assumptions.

For my part, I’m a bit monkey who mostly works on the decode/demux side of things. One day I would like to understand more about encoding.