Category Archives: Open Source Multimedia

News regarding open source multimedia projects.

To Encode A VP3 Golden Frame

I found some notes I wrote nearly 4 years ago about how to encode a VP3 (and, by extension, Theora) intraframe. Where does the time go? This was back when I was hard at work on the native VP3 decoder for FFmpeg. An intraframe is referred to as a golden frame in the VP3 coding scheme. The notion of a golden frame differs slightly from that of a traditional intraframe in that the decoder retains one golden frame until the next golden frame and any interframe between golden frames A and B may refer back to golden frame A in addition to the interframe just prior.

In case anyone were to want to build a VP3 encoder from scratch, it would be prudent to start with just a golden frame encoder. This is the rough process that I perceive for encoding the golden frame:

  • Pick a number, any number, between 0..63. Call it the quality_index and initialize the golden frame Y- and C-plane quantization matrices with them.
  • Break up the Y, U, and V planes (oh yeah, convert the image to YUV 4:2:0 if it’s not already; also important: encode the image from bottom to top, not top to bottom) into 8×8 blocks of samples, but call them fragments, not blocks.
  • Shove each fragment through the On2 discrete cosine transform. Do not use the DCT commonly used in JPEG/MPEG/H.26x codecs as that is not precisely the same as the one that On2 likes to use.
  • Quantize the coefficients.
  • Reduce the DC coefficients via the VP3 prediction method.
  • Zig the coefficients. Zag them, too. Use the inverse zigzag order that you would for MPEG et al. (i.e., MPEG de-zigzag order = VP3 zigzag order).
  • Write the bitstream header (3 bytes for a keyframe).
  • Encode all of the coefficients into the bitstream.

Sounds easy enough, I know. However, that last step about encoding coefficients is grotesquely understated. VP3 doesn’t encode stuff fragment by fragment. First, it encodes all of the DC coefficients, followed by all of the 1st AC coefficients for all fragments, then the 2nd AC coefficients for all fragments, on up to the 63rd AC coefficients. It doesn’t follow a nice, neat, left->right fragment pattern, oh goodness, no. It follows a Hilbert pattern along individual superblocks in each plane. A superblock in the VP3 vernacular is 32×32 samples, or 4×4 fragments.

Encoding the coefficient bitstream is a matter of encoding the VLC representing one of 32 codes, followed by optional data depending on the code. The 32 codes specify whether a coefficient is coded by itself in the bitstream after the VLC, or if the coefficient ought to be one of the more common small coefficients like +/- 1 or 2, or if the next AC coefficients for the next n blocks should be set to zero, or if the current block being visited is EOB (no more non-zero coefficients)… etc. The possibilities are many– 32, in fact, as stated before.

However, there are 32 VLC tables hardcoded in the VP3 coding scheme, 16 each for DC and AC coefficients. Theora makes these 32 tables user-definable. Which table to use? It seems to me that the encoder needs to make one pass through the code stream in order to gather statistics about which table provides the optimal encoding for each. 4 bits specifying which DC VLC table and 4 more bits specifying AC VLC table are encoded in front of the main coefficient bitstream.

Why is the VP3 coding method this complicated? I have long suspected that the designers inserted all of these algorithmic oddities in a courageous effort to side-step patent claims. Just wait until you see what is involved in encoding an interframe.

See Also:

Classical Audio

Trixter, digital P.I. and archaeologist, has been tracking down authors who long ago reverse engineered custom audio formats. He stalked down one Adrienne Cousins, the author of a program called Sputter, a multi-format audio file encoder/decoder… all in 16-bit x86 ASM! The source includes codecs for the Covox ADPCM formats mentioned recently. Thanks to Kostya, though, for independently reverse engineering and documenting the format.

Still, there might be some other gems in the Sputter source code. The author has granted permission to redistribute the source, and it is now in this directory: http://samples.mplayerhq.hu/A-codecs/CreativeADPCM8bit/. Also in that directory is a package called VEDIT.rar which contains an old EGA-based program that is able to encode VOC files to that arcane 2.6-bit ADPCM format that the Sound Blaster could encode and decode in hardware. If reversed engineered, it might describe how the 2.6-bit algorithm works.

GSoC Evaluation Tasks

If a student wanted to make the FFmpeg cut for the 2007 Google Summer of Code, they were required to complete a qualification task. They could select these tasks from a preset list on our GSoC Wiki page, or they could submit their own idea for approval. In completing a task, the student demonstrated that they:

  • were willing to commit some time over the 3-week app review period to prove themselves
  • could code C satisfactorily and with the same general style expected of the FFmpeg codebase
  • could wrap their head around the code base
  • could interact with the FFmpeg development mailing list
  • could accept feedback and revise the code until it was suitable for inclusion in the main code base

We got some good labor out of the qualifying students. Here are some of the things that FFmpeg can now do, thanks to the GSoC qualification tasks:

  • THP playback subsystem
  • Bethsoft VID playback subsystem
  • C93 playback subsystem
  • TIFF encoder, with LZW support
  • support for Theora in Matroska
  • SGI image subsystem ported to latest API
  • Targa encoder (thanks to Ben for the reminder)

SoC Students Announced

Great news of the Google Summer of Code front– FFmpeg put their project allotment of 5 to such good use last year that the overseers entrusted FFmpeg with a total of 8 project slots this year. These are the projects. The — ahem — lucky students include our returning SoC champ from the 2006 season, Kostya. New contenders are David Robert Conrad, Robert Bingham, Bartlomiej Wolowiec, Kamil Nowosad, Marco Gerards, Xiaohui Sun, and Reynaldo H. Verdejo Pinochet.