Category Archives: Open Source Multimedia

News regarding open source multimedia projects.

Superblock Corner Cases

Look who has been playing around some more with the vector drawing program. Here’s an illustration of somewhat limited utility but that still demonstrates an important point of the VP3/Theora coding scheme:


VP3/Theora superblock traversal corner cases

The image above depicts a hypothetical frame in the VP3/Theora coding scheme that has sample dimensions of 88×48. The valid 8×8 fragments are depicted in green. Since these do not line up nicely on 32-sample superblock boundaries, round up to the nearest superblock in either dimension. The green fragments inside the turquoise zone are the visible fragments. The grey fragments are phantoms that still must be accounted for in the overall superblock traversal pattern when coding/decoding the transform coefficients.

There is also the matter of what happens when the width and height of the frame do not line up on fragment boundaries (i.e., are not divisible by 8). The image is rounded up to the nearest fragment size for the purpose of transform coding.

The Legend Of Hilbert

I’ve been wanting to learn how to use a basic vector drawing program for some time now for the purpose of illustrating certain codec concepts more concretely. Sure, this will be for the benefit of others who are curious about the craft. But mostly, I do it for me because, well… me like pictures.

Behold, my first vector drawing, constructed using OpenOffice’s Draw program:


VP3/Theora Superblock Traversal Pattern

When I was first reverse engineering an English language description of the VP3 format and implementing a new decoder for FFmpeg, I figured out the curious pattern that the codec uses to traverse 4×4 fragments (blocks of 8×8 samples) within a VP3 superblock. I posted to the theora-dev mailing list asking if the pattern struck anyone as familiar. Personally, the pattern reminded me of playing the original NES The Legend of Zelda title, sort of like a pattern for traversing rooms in a dungeon. In fact, early iterations of my decoder used the identifier zelda[].

However, someone on the list identified it as resembling a Hilbert curve, discovered by some famous math dude. One of the codec’s designers chimed in on the list and stated that he had never even heard of Hilbert and that the traversal pattern was chosen to meet certain criteria. Any resemblance to the Hilbert curve was to be considered strictly coincidental.

Looking back on that old mailing list traffic, and taking a good look at the actual Hilbert curve from the link above, I may have made a mistake in using the term “Hilbert pattern” to describe the traversal sequence pictured above. It’s a little late now to change it back to “Zelda pattern”– Google demonstrates that the first term sort of caught on for VP3/Theora-related matters.

ATRAC3 Decoder

Ever so quietly, a new open source ATRAC3 decoder implementation has been slipped into FFmpeg. This decoder handles atrc data inside of RealMedia files or in WAV files.

Thanks to Benjamin Larsson and Maxim Poliakovski for their diligent work on this, as well as the guru for his tireless reviewing efforts and uncompromising code quality standards.

RealAudio samples here and WAV samples here.

About The VP3 Interframe Encoding

I should do a followup to the VP3 golden frame encoding brainstorm while this stuff is still fresh on the brain. Let’s talk about a possible approach for encoding VP3 (and again, by extension, Theora) interframes. Along the way, I’ll discuss the parts that (I hope) can be handled by FFmpeg’s internal facilities.

A VP3 golden frame only encodes a header followed by a coefficient bitstream. An interframe contains a header, several segments describing which superblocks, macroblocks, and fragments in the frame are coded and how, a segment for motion vector data, and finally, the coefficient bitstream. Note that the interframe is concerned with the notion of a macroblock — 2×2 Y fragments + 1 U fragment + 1 V fragment, the same as the traditional JPEG/MPEG concept — whereas the golden frame does not care about macroblocks. This is because motion vectors operate on a per-macroblock basis.

Rough outline– for each macroblock in the interframe, hand the macroblock over to FFmpeg’s libavcodec facility to work its motion estimation magic. I may be making a huge assumption here, but I’m hoping that I can pass lavc a macroblock along with 1 or 2 reference frames (previous frame and golden frame) and ask it to use its selected ME algorithm to search on a half-pel grid and find the best coding mode. The macroblock could be unchanged from the previous frame, or from the golden frame. It could use a fragment offset with a motion vector based on the previous frame or the golden frame. It could also reference a fragment from the previous frame but using one of the last 2 motion vectors. In the most complex case, the macroblock could use 4 separate motion vectors, one for each Y block, while all 4 are averaged together for the 2 C planes. And if nothing else will do, it could be declared that the macroblock needs to be intracoded, just like in a golden frame. One more thing, though– not all of the fragments in the macroblock have to be coded. The encoder can decide that a fragment is similar enough to the same position in the previous frame to warrant leaving it alone. But if a fragment is coded, it must go along with the same coding mode as the other coded fragments in the same macroblock.

Of course, VP3, like many other codecs, does not require exact matches for motion estimation. Instead, find the best possible block and code the residual difference. Through this process, the encoder will be tracking motion vectors and coding modes for each macroblock. For the 6 constituent fragments of the macroblock, if coded, perform the transform on either the raw samples or the residual, then the zigzagging and DC reduction processes as outlined in the golden frame method. Then…

it’s time to pack it all up into a bitstream.

First, write out the frame header (it’s only a single byte this time). Then, pack information about the coding status of each superblock in the frame. A superblock can either be fully coded (each fragment changed), partially coded (some fragments changed), or not coded at all (entire superblock is copied from previous frame). First, pack all of the partially-coded superblocks. Any remaining superblocks that are not partially coded must, by process of elimination, be either fully coded or not coded at all. Pack information about whether the remaining blocks are fully or non-coded. Then, if any of the superblocks are partially coded, pack information about which fragments inside each superblock are coded (remember the Hilbert pattern for superblock traversal).

Next up is the macroblock coding mode information. Similar to the process for finding the optimal Huffman tables for VLC coding, some statistics must be gathered for macroblock coding modes because there are a number of different “alphabets” (as the VP3 scheme calls them) which arrange the coding modes in different orders within a list. The modes at the front of the list take fewer bits to code than the modes at the end of the list. Alternatively, if there is a more or less even distribution, the encoder can specify that each coding mode should be encoded with 3 bits (8 possible modes).

Then there are the motion vectors. Nothing too fancy here; this is probably the most straightforward segment of the bitstream encoding. Just march along the macroblocks and if the coding modes demand any motion vectors (new motion vectors, not referring to the motion vectors used for previous blocks), encode those with the variable bit scheme that VP3 uses for motion vectors.

Finally, there is the coefficient data. Pack it up the same as would be done for a golden frame (stated with the same deceptive simplicity as in the previous post on the matter).

Further Reading: