Author Archives: Multimedia Mike

A Better Process Runner

I was recently processing a huge corpus of data. It went like this: For each file in a large set, run 'cmdline-tool <file>', capture the output and log results to a database, including whether the tool crashed. I wrote it in Python. I have done this exact type of the thing enough times in Python that I’m starting to notice a pattern.

Every time I start writing such a program, I always begin with using Python’s commands module because it’s the easiest thing to do. Then I always have to abandon the module when I remember the hard way that whatever ‘cmdline-tool’ is, it might run errant and try to execute forever. That’s when I import (rather, copy over) my process runner from FATE, the one that is able to kill a process after it has been running too long. I have used this module enough times that I wonder if I should spin it off into a new Python module.

Or maybe I’m going about this the wrong way. Perhaps when the data set reaches a certain size, I’m really supposed to throw it on some kind of distributed cluster rather than task it to a Python script (a multithreaded one, to be sure, but one that runs on a single machine). Running the job on a distributed architecture wouldn’t obviate the need for such early termination. But hopefully, such architectures already have that functionality built in. It’s something to research in the new year.

I guess there are also process limits, enforced by the shell. I don’t think I have ever gotten those to work correctly, though.

More Weird VP8 Encodings

When I announced that I had transitioned my VP8 encoder’s status from “toy” to “working”, Jim L. lamented the loss of humorous posts about oddly encoded images output from my encoder. Not so! There are still plenty of features that I have yet to implement, each of which carries the possibility of bizarre images.

For example, I dusted off my work-in-progress intra 4×4 encoding, fixed a few of the more obvious bugs, and told the encoder to encode the first block in 4×4 mode and the rest in the usual, working, debugged 16×16 mode. The results of the first pass surprised me:



The reason this surprised me was that I intuitively expected one of 2 outcomes:

  • Perfect image right away since everything is correct (very unlikely but not outside the realm of possibility)
  • Total garbage with, at most, the first macroblock looking somewhat legible; this would be due to having some of the first macroblock correct but completely desynchronizing the bitstream for the purpose of decoding the rest of the coefficients.

I absolutely did not expect the first macroblock to look messed up but for the rest of the picture to look fine. For fun, I reversed the logic and encoded the first block as 16×16 and the rest with the experimental 4×4 mode:



If you examine carefully, you will see that the color planes are correct (though faint). There just isn’t much going on in the luma plane. This made sense when I noticed the encoder was encoding a blank (undefined, actually) set of luma coefficients for 4×4 mode macroblocks due to a bug. This helps to rationalize the first image as well– the first macroblock was encoding nonsense for the first macroblock which messed up the macroblocks which immediately surrounded it. Eventually, macroblock decoding got back on track when the prediction modes weren’t relying on the errantly decoded macroblocks.

After I fixed that bug, I let the 4×4 mode rip through the whole image. That’s when I got what I am terming the “dark and gritty reboot of Big Buck Bunny”:



Fortunately, this also turned out to be traceable to a pretty obvious code bug.

One day, this VP8 encoder might do the right thing while implementing all of the algorithm’s features. In the meantime, it’s at least entertaining to watch it make mistakes.

Giving Thanks For VP8

It’s the Thanksgiving holiday here in the United States. I guess that’s as good a reason as any to release a first cut of my VP8 encoder. In order to remind people that they shouldn’t expect phenomenal quality from it — and to discourage inexperienced people from trying to create useful videos with it — I have hardcoded the quantizers to their maximum settings. For those not skilled in the art, this is the setting that yields maximum compression and worst quality. When compressing the Big Buck Bunny logo image, the resulting file is only 2839 bytes but observe the reconstructed quality:



It really just looks like a particularly stormy day in the forest.

First VP8 File From An Independent Encoder
I found a happy medium on the quantizer scale and encoded the first 30 seconds of Big Buck Bunny for your inspection. I guess this makes it the first VP8/WebM file from an independent encoder (using FFmpeg’s Matroska muxer as well).

Download: bbb-360p-30sec-q40.webm (~13 MBytes)

I think the quality makes it look like it was digitized from an old VHS tape.

For fun, here’s the version with the quantizer cranked to the max: bbb-360p-30sec-q127.webm (~1.3 MBytes)

Aside: I was going to encapsulate the video in this post using a bare HTML5 <video> tag for the benefit of the small browsing population who could view that (indeed, it works fine in Chrome). But that would be insane due to the fact that supporting browsers preload the video with no easy (read: without the help of JavaScript) method for overriding this unacceptable default.

The Code
I’m still trying to get over my fear of git. To that end, I have posted the code on Github:

https://github.com/multimediamike/ffvp8enc

I still don’t like you, git. But I’m sure we’ll find some way to make this work.

Other required code changes in the basic FFmpeg tree:

  • Of course, copy vp8enc.c into libavcodec/
  • In libavcodec/allcodecs.c, ‘REGISTER_DECODER (VP8, vp8);‘ turns into ‘REGISTER_ENCDEC (VP8, vp8);
  • Add ‘OBJS-$(CONFIG_VP8_ENCODER) += vp8enc.o‘ to libavcodec/Makefile

Further Work
About the limitations and work yet to do:

  • it’s still intra-only, no interframes (which is where a lot of compression occurs)
  • no rate control or distortion optimization, obviously
  • no intra 4×4 coding (that’s close to working but didn’t my little T-day deadline)
  • no quantization control; this should really be hooked up to the FFmpeg command line but I’m not sure how
  • encoder writes into a static-sized, 1/2 MB memory buffer; this can overflow
  • code is a mess (what did you expect at this stage of the game?)
  • lots and lots of other things, surely

Greed is Good; Greed Works

Greed, for lack of a better word, is good; Greed works. Well, most of the time. Maybe.

Picking Prediction Modes
VP8 uses one of 4 prediction modes to predict a 16×16 luma block or 8×8 chroma block before processing it (for luma, a block can also be broken into 16 4×4 blocks for individual prediction using even more modes).

So, how to pick the best predictor mode? I had no idea when I started writing my VP8 encoder. I did not read any literature on the matter; I just sat down and thought of a brute-force approach. According to the comments in my code:

// naive, greedy algorithm:
//   residual = source - predictor
//   mean = mean(residual)
//   residual -= mean
//   find the max diff between the mean and the residual
// the thinking is that, post-prediction, the best block will
// be comprised of similar samples

After removing the predictor from the macroblock, individual 4×4 subblocks are put through a forward DCT and quantized. Optimal compression in this scenario results when all samples are the same since only the DC coefficient will be non-zero. Failing that, when the input samples are at least similar to each other, few of the AC coefficients will be non-zero, which helps compression. When the samples are all over the scale, there aren’t a whole lot of non-zero coefficients unless you crank up the quantizer, which results in poor quality in the reconstructed subblocks.

Thus, my goal was to pick a prediction mode that, when applied to the input block, resulted in a residual in which each element would feature the least deviation from the mean of the residual (relative to other prediction choices).

Greedy Approach
I realized that this algorithm falls into the broad general category of “greedy” algorithms— one that makes locally optimal decisions at each stage. There are most likely smarter algorithms. But this one was good enough for making an encoder that just barely works.

Compression Results
I checked the total file compression size on my usual 640×360 Big Buck Bunny logo image while forcing prediction modes vs. using my greedy prediction picking algorithm. In this very simple test, DC-only actually resulted in slightly better compression than the greedy algorithm (which says nothing about overall quality).

prediction mode quantizer index = 0 (minimum) quantizer index = 10
greedy 286260 98028
DC 280593 95378
vertical 297206 105316
horizontal 295357 104185
TrueMotion 311660 113480

As another data point, in both quantizer cases, my greedy algorithm selected a healthy mix of prediction modes:

  • quantizer index 0: DC = 521, VERT = 151, HORIZ = 183, TM = 65
  • quantizer index 10: DC = 486, VERT = 167, HORIZ = 190, TM = 77

Size vs. Quality
Again, note that this ad-hoc test only measures one property (a highly objective one)– compression size. It did not account for quality which is a far more controversial topic that I have yet to wade into.