Monthly Archives: November 2010

Giving Thanks For VP8

It’s the Thanksgiving holiday here in the United States. I guess that’s as good a reason as any to release a first cut of my VP8 encoder. In order to remind people that they shouldn’t expect phenomenal quality from it — and to discourage inexperienced people from trying to create useful videos with it — I have hardcoded the quantizers to their maximum settings. For those not skilled in the art, this is the setting that yields maximum compression and worst quality. When compressing the Big Buck Bunny logo image, the resulting file is only 2839 bytes but observe the reconstructed quality:



It really just looks like a particularly stormy day in the forest.

First VP8 File From An Independent Encoder
I found a happy medium on the quantizer scale and encoded the first 30 seconds of Big Buck Bunny for your inspection. I guess this makes it the first VP8/WebM file from an independent encoder (using FFmpeg’s Matroska muxer as well).

Download: bbb-360p-30sec-q40.webm (~13 MBytes)

I think the quality makes it look like it was digitized from an old VHS tape.

For fun, here’s the version with the quantizer cranked to the max: bbb-360p-30sec-q127.webm (~1.3 MBytes)

Aside: I was going to encapsulate the video in this post using a bare HTML5 <video> tag for the benefit of the small browsing population who could view that (indeed, it works fine in Chrome). But that would be insane due to the fact that supporting browsers preload the video with no easy (read: without the help of JavaScript) method for overriding this unacceptable default.

The Code
I’m still trying to get over my fear of git. To that end, I have posted the code on Github:

https://github.com/multimediamike/ffvp8enc

I still don’t like you, git. But I’m sure we’ll find some way to make this work.

Other required code changes in the basic FFmpeg tree:

  • Of course, copy vp8enc.c into libavcodec/
  • In libavcodec/allcodecs.c, ‘REGISTER_DECODER (VP8, vp8);‘ turns into ‘REGISTER_ENCDEC (VP8, vp8);
  • Add ‘OBJS-$(CONFIG_VP8_ENCODER) += vp8enc.o‘ to libavcodec/Makefile

Further Work
About the limitations and work yet to do:

  • it’s still intra-only, no interframes (which is where a lot of compression occurs)
  • no rate control or distortion optimization, obviously
  • no intra 4×4 coding (that’s close to working but didn’t my little T-day deadline)
  • no quantization control; this should really be hooked up to the FFmpeg command line but I’m not sure how
  • encoder writes into a static-sized, 1/2 MB memory buffer; this can overflow
  • code is a mess (what did you expect at this stage of the game?)
  • lots and lots of other things, surely

Greed is Good; Greed Works

Greed, for lack of a better word, is good; Greed works. Well, most of the time. Maybe.

Picking Prediction Modes
VP8 uses one of 4 prediction modes to predict a 16×16 luma block or 8×8 chroma block before processing it (for luma, a block can also be broken into 16 4×4 blocks for individual prediction using even more modes).

So, how to pick the best predictor mode? I had no idea when I started writing my VP8 encoder. I did not read any literature on the matter; I just sat down and thought of a brute-force approach. According to the comments in my code:

// naive, greedy algorithm:
//   residual = source - predictor
//   mean = mean(residual)
//   residual -= mean
//   find the max diff between the mean and the residual
// the thinking is that, post-prediction, the best block will
// be comprised of similar samples

After removing the predictor from the macroblock, individual 4×4 subblocks are put through a forward DCT and quantized. Optimal compression in this scenario results when all samples are the same since only the DC coefficient will be non-zero. Failing that, when the input samples are at least similar to each other, few of the AC coefficients will be non-zero, which helps compression. When the samples are all over the scale, there aren’t a whole lot of non-zero coefficients unless you crank up the quantizer, which results in poor quality in the reconstructed subblocks.

Thus, my goal was to pick a prediction mode that, when applied to the input block, resulted in a residual in which each element would feature the least deviation from the mean of the residual (relative to other prediction choices).

Greedy Approach
I realized that this algorithm falls into the broad general category of “greedy” algorithms— one that makes locally optimal decisions at each stage. There are most likely smarter algorithms. But this one was good enough for making an encoder that just barely works.

Compression Results
I checked the total file compression size on my usual 640×360 Big Buck Bunny logo image while forcing prediction modes vs. using my greedy prediction picking algorithm. In this very simple test, DC-only actually resulted in slightly better compression than the greedy algorithm (which says nothing about overall quality).

prediction mode quantizer index = 0 (minimum) quantizer index = 10
greedy 286260 98028
DC 280593 95378
vertical 297206 105316
horizontal 295357 104185
TrueMotion 311660 113480

As another data point, in both quantizer cases, my greedy algorithm selected a healthy mix of prediction modes:

  • quantizer index 0: DC = 521, VERT = 151, HORIZ = 183, TM = 65
  • quantizer index 10: DC = 486, VERT = 167, HORIZ = 190, TM = 77

Size vs. Quality
Again, note that this ad-hoc test only measures one property (a highly objective one)– compression size. It did not account for quality which is a far more controversial topic that I have yet to wade into.

Studying A Game Wave Disc

I picked up a used copy of game called Gemz — a rather flagrant Bejeweled clone — for a game console called Game Wave Family Entertainment System. Heard of it? Neither had I. But the game media is optical, so I had to get it and study it.



When mounted in Linux (as UDF), the disc is reported to contain 2.8 GB of data, so it has to be a DVD. 810 MB of that is dedicated to the movies/ directory. Multimedia format? Just plain, boring MPEG files (very YouTube-friendly– here’s the opening animation). Deeper digging reveals some more subdirectories called movies/ that, combined, occupy the lion’s share of the disc space. Additionally, there are several single-frame .m2v files in a directory called iframes/ which are used to encode things like load screens.



There are more interesting data files including .zbm files for images and fonts, and .zwf files for audio. I suspect that these stand for zipped bitmap and zipped wave file, respectively. They can’t be directly unzipped with ‘gunzip’. Some of the numbers at the start of some files lead me to believe they can be easily decompressed with standard zlib facilities.

Based on the binary files on the Gemz disc, I couldn’t find any data on what CPU this system might use. A little Googling led me to this page at the Video Game Console Library which pegs the brain as a Mediamatics 6811. Some searching for that leads me to a long-discontinued line of hardware from National Semiconductor.

The Console Library page also mentions that the games were developed using the Lua programming language. Indeed, there are many Lua-related strings in the game’s binaries (‘zlib’ also makes an appearance).

The Big VP8 Debug

I hope my previous walkthrough of the VP8 4×4 intra coding process was educational. Today, I’ll be walking through an example of what happens when my toy VP8 encoder encodes an intra 16×16 block. This may prove educational to those who have never been exposed to the deep details of this or related algorithms. Also, I wanted to illustrate where I think my VP8 encoder process is going bad and generating such grotesque results.

Before I start, let me give a shout-out to Google Docs’ Drawing tool which I used to generate these diagrams. It works quite well.

Results

(Always cut to the chase in a blog post; results first.) I’m glad I composed this post. In the course of doing so, I found the problem, fixed it, and am now able to present this image that was decoded from the bitstream encoded by my toy working VP8 encoder:



Yeah, I know that image doesn’t look like anything you haven’t seen before. The difference is that it has made a successful trip through my VP8 encoder.

Follow along through the encoding process and learn of the mistake…

Original Block and Subblocks

Here is the 16×16 block to be encoded:



The block is broken down into 16 4×4 subblocks for further encoding:



Prediction
Continue reading