Author Archives: Multimedia Mike

Learn Multimedia Programming By Writing A JPEG Decoder

For those of you who hack on multimedia tech, how did you get started? Did you begin by studying the mathematical underpinnings of multimedia codec algorithms? Or did you find a practical problem and jump right in by writing code? (Personally, I was always more of a nuts & bolts hacker than a math guy.) I ask because I occasionally get emails from aspiring multimedia hackers who want to know where to begin. Invariably, they want to go the math-first route. I heavily discourage this approach.

I have a crazy idea for anyone who wants a crash course on multimedia hacking: write a JPEG decoder. In doing so, you will be exposed to a lot of key domain concepts such as bitstream parsing, Huffman decoding, dequantization, zigzagging, the dreaded (inverse) discrete cosine transform, YUV vs. RGB colorspaces, macroblock organization, delta coding, and run length coding.

Sure, JPEG decoding is a solved problem. But that’s hardly the point. Why would you enter an unfamiliar field and hope to come up to speed on the basics by leaping straight into the domain’s unsolved problems? If you are successful in this exercise, no one will ever use the fruits of your labor, but that doesn’t really matter.

So, do you want to learn multimedia hacking quickly? Then grab a JPEG file (maybe create a few contrived ones that are small, have friendly dimensions, and feature predictable patterns), grab a good JPEG reference, and implement the decoding algorithm in the language and platform of your choice.

On the matter of the reference, my personal favorite reference has always been A note about the JPEG decoding algorithm by Cristi Cuturicu. The English grammar is a bit dodgy but overall, it might be the best reference you’ll find on the matter– as simple as it needs to be, but no simpler.

Good luck!

A Better Process Runner

I was recently processing a huge corpus of data. It went like this: For each file in a large set, run 'cmdline-tool <file>', capture the output and log results to a database, including whether the tool crashed. I wrote it in Python. I have done this exact type of the thing enough times in Python that I’m starting to notice a pattern.

Every time I start writing such a program, I always begin with using Python’s commands module because it’s the easiest thing to do. Then I always have to abandon the module when I remember the hard way that whatever ‘cmdline-tool’ is, it might run errant and try to execute forever. That’s when I import (rather, copy over) my process runner from FATE, the one that is able to kill a process after it has been running too long. I have used this module enough times that I wonder if I should spin it off into a new Python module.

Or maybe I’m going about this the wrong way. Perhaps when the data set reaches a certain size, I’m really supposed to throw it on some kind of distributed cluster rather than task it to a Python script (a multithreaded one, to be sure, but one that runs on a single machine). Running the job on a distributed architecture wouldn’t obviate the need for such early termination. But hopefully, such architectures already have that functionality built in. It’s something to research in the new year.

I guess there are also process limits, enforced by the shell. I don’t think I have ever gotten those to work correctly, though.

More Weird VP8 Encodings

When I announced that I had transitioned my VP8 encoder’s status from “toy” to “working”, Jim L. lamented the loss of humorous posts about oddly encoded images output from my encoder. Not so! There are still plenty of features that I have yet to implement, each of which carries the possibility of bizarre images.

For example, I dusted off my work-in-progress intra 4×4 encoding, fixed a few of the more obvious bugs, and told the encoder to encode the first block in 4×4 mode and the rest in the usual, working, debugged 16×16 mode. The results of the first pass surprised me:



The reason this surprised me was that I intuitively expected one of 2 outcomes:

  • Perfect image right away since everything is correct (very unlikely but not outside the realm of possibility)
  • Total garbage with, at most, the first macroblock looking somewhat legible; this would be due to having some of the first macroblock correct but completely desynchronizing the bitstream for the purpose of decoding the rest of the coefficients.

I absolutely did not expect the first macroblock to look messed up but for the rest of the picture to look fine. For fun, I reversed the logic and encoded the first block as 16×16 and the rest with the experimental 4×4 mode:



If you examine carefully, you will see that the color planes are correct (though faint). There just isn’t much going on in the luma plane. This made sense when I noticed the encoder was encoding a blank (undefined, actually) set of luma coefficients for 4×4 mode macroblocks due to a bug. This helps to rationalize the first image as well– the first macroblock was encoding nonsense for the first macroblock which messed up the macroblocks which immediately surrounded it. Eventually, macroblock decoding got back on track when the prediction modes weren’t relying on the errantly decoded macroblocks.

After I fixed that bug, I let the 4×4 mode rip through the whole image. That’s when I got what I am terming the “dark and gritty reboot of Big Buck Bunny”:



Fortunately, this also turned out to be traceable to a pretty obvious code bug.

One day, this VP8 encoder might do the right thing while implementing all of the algorithm’s features. In the meantime, it’s at least entertaining to watch it make mistakes.

Giving Thanks For VP8

It’s the Thanksgiving holiday here in the United States. I guess that’s as good a reason as any to release a first cut of my VP8 encoder. In order to remind people that they shouldn’t expect phenomenal quality from it — and to discourage inexperienced people from trying to create useful videos with it — I have hardcoded the quantizers to their maximum settings. For those not skilled in the art, this is the setting that yields maximum compression and worst quality. When compressing the Big Buck Bunny logo image, the resulting file is only 2839 bytes but observe the reconstructed quality:



It really just looks like a particularly stormy day in the forest.

First VP8 File From An Independent Encoder
I found a happy medium on the quantizer scale and encoded the first 30 seconds of Big Buck Bunny for your inspection. I guess this makes it the first VP8/WebM file from an independent encoder (using FFmpeg’s Matroska muxer as well).

Download: bbb-360p-30sec-q40.webm (~13 MBytes)

I think the quality makes it look like it was digitized from an old VHS tape.

For fun, here’s the version with the quantizer cranked to the max: bbb-360p-30sec-q127.webm (~1.3 MBytes)

Aside: I was going to encapsulate the video in this post using a bare HTML5 <video> tag for the benefit of the small browsing population who could view that (indeed, it works fine in Chrome). But that would be insane due to the fact that supporting browsers preload the video with no easy (read: without the help of JavaScript) method for overriding this unacceptable default.

The Code
I’m still trying to get over my fear of git. To that end, I have posted the code on Github:

https://github.com/multimediamike/ffvp8enc

I still don’t like you, git. But I’m sure we’ll find some way to make this work.

Other required code changes in the basic FFmpeg tree:

  • Of course, copy vp8enc.c into libavcodec/
  • In libavcodec/allcodecs.c, ‘REGISTER_DECODER (VP8, vp8);‘ turns into ‘REGISTER_ENCDEC (VP8, vp8);
  • Add ‘OBJS-$(CONFIG_VP8_ENCODER) += vp8enc.o‘ to libavcodec/Makefile

Further Work
About the limitations and work yet to do:

  • it’s still intra-only, no interframes (which is where a lot of compression occurs)
  • no rate control or distortion optimization, obviously
  • no intra 4×4 coding (that’s close to working but didn’t my little T-day deadline)
  • no quantization control; this should really be hooked up to the FFmpeg command line but I’m not sure how
  • encoder writes into a static-sized, 1/2 MB memory buffer; this can overflow
  • code is a mess (what did you expect at this stage of the game?)
  • lots and lots of other things, surely