As stated in a previous blog post on the matter, FFmpeg’s policy is to reimplement codecs rather than adopt other codebases wholesale. And so it is with Google’s recently open sourced VP8 codec, the video portion of their Webm initiative. I happen to know that the new FFmpeg implementation is in the capable hands of several of my co-developers so I’m not even worrying about that angle.
Instead, I thought of another of my characteristically useless exercises: Create an independent VP8 decoder implementation entirely in pure Python. Silly? Perhaps. But it has one very practical application: By attempting to write a new decoder based on the official bitstream documentation, this could serve as a mechanism for validating said spec, something near and dear to my heart.
What is the current state of the spec? Let me reiterate that I’m glad it exists. As I stated during the initial open sourcing event, everything that Google produced for the initial event went well beyond my wildest expectations. Having said that, the documentation does fall short in a number of places. Fortunately, I am on the Webm mailing lists and am sending in corrections and ideas for general improvement. For the most part, I have been able to understand the general ideas behind the decoding flow based on the spec and am even able to implement certain pieces correctly. Then I usually instrument the libvpx source code with output statements in order to validate that I’m doing everything right.
Unfortunately, I’m quite blocked right now on the chapter regarding token/DCT coefficient decoding (chapter 13 in the current document iteration). In his seminal critique of the codec, Dark Shikari complained that large segments of the spec are just C code fragments copy and pasted from the official production decoder. As annoying as that is, the biggest insult comes at the end of section 13.3:
While we have in fact completely described the coefficient decoding procedure, the reader will probably find it helpful to consult the reference implementation, which can be found in the file detokenize.c.
The reader most certainly will not find it helpful to consult the file detokenize.c. The file in question implements the coefficient residual decoding with an unholy sequence of C macros that contain goto statements. Honestly, I thought I did understand the coefficient decoding procedure based on the spec’s description. But my numbers don’t match up with the official decoder. Instrumenting or tracing macro’d code is obviously painful and studying the same code is making me think I don’t understand the procedure after all. To be fair, entropy decoding often occupies a lot of CPU time for many video decoders and I have little doubt that the macro/goto approach is much faster than clearer, more readable methods. It’s just highly inappropriate to refer to it for pedagogical purposes.
Aside: For comparison, check out the reference implementation for the VC-1 codec. It was written so clearly and naively that the implementors used an O(n) Huffman decoder. That’s commitment to clarity.
I wonder if my FFmpeg cohorts are having better luck with the DCT residue decoding in their new libavcodec implementation? Maybe if I can get this Python decoder working, it can serve as a more appropriate reference decoder.
Update: Almost immediately after I posted this entry, I figured out a big problem that was holding me back, and then several more small ones, and finally decoded by first correct DCT coefficient from the stream (I’ve never been so happy to see the number -448). I might be back on track now. Even better was realizing that my original understanding of the spec was correct.
It’s probably unfair and inaccurate but you have to admit it’s funny. Luckily, quality nitpickings aren’t my department. I’m just interested in getting codecs working, tested, and documented so that more people can use them reliably.