A little over a year ago, Sun was making rumblings about a brand new video codec that they were hoping to design from the ground up using known-good (read: patent-unencumbered) coding algorithms. This was to be called the Open Media Stack (OMS). I wrote a post about, made an obligatory MultimediaWiki page about it, and then promptly forgot all about it.
Today, by way of a blog post by Opera’s Bruce Lawson describing why HTML5’s <video> tag is, well, stalled (to put it charitably), I learned that Sun’s OMS team has published at least 2 specs, the latest one being posted a few weeks ago on June 9, 2009. As he notes, the proposed Oracle buyout of Sun puts the OMS project’s status in limbo.
The spec page links to forum threads where interested parties can discuss issues in the spec. There aren’t many comments but the ones that exist seem to revolve around the codec’s articial resolution limitations. For my part, I wonder about how to encapsulate it into a container format for transport. The format specifies a sequence header that is 96 bits (12 bytes) in length, though there are provisions for (currently unused) extended data as well as free-form user data. The sequence header would be categorized as extradata in an AVI file or in a MOV file’s video stsd atom. Successive frames should be straightforward to store in a container format since the coding method only seems to employ intra-and inter-frames. Each frame begins with a header which specifies a 37-bit timestamp in reference to a 90 kHz clock. This allows for a recording that’s just over 1 week in length. It’s also one of those highly redundant items if this format were to be stuffed in a more generalized container format.
Anyway, the main video algorithm uses arithmetic and Golomb coding for its entropy coding; 8×8 pixel macroblocks which can either be an entire block unto themselves or subdivided into 4 4×4 sub-blocks; colorspace is YUV 4:2:0 planar; specifies bit-exact 2×2, 4×4, and 8×8 transforms and quantization procedures; spacial prediction using left, top-left, top, and top-right blocks; precisely-specified 1/4-pel motion compensation. All in all, it appears relatively simple; the 0.91 spec (annexes and all) weighs in at a mere 96 pages.
Naturally, there are no reference implementations or samples available. This got me to wondering about how one goes about creating a reference implementation of a codec (or a production implementation, for that matter). The encoder needs to come first so that it can generate samples. Only then can a decoder be written. Ideally, the initial encoder and decoder should be 2 totally different programs, written by 2 different people, though hopefully working closely together (speedy communication helps). There is wisdom in the FFmpeg community about not basing the primary encoder and decoder on the same base library after we reverse engineered one of the Real video codecs and found a fairly obvious bug that occurred in both sides of the codec.
I think I know one way to ensure that the encoder and decoder don’t share any pieces– write them in different computer languages.
I’m still wondering what kind of application would need to record video continuously for up to a week. How about a closed-circuit security camera? With a terabyte drive, it could store video for a week assuming a bitrate of 1.5 Mbits/sec. That’s roughly the same bitrate as the original MPEG-1 standard. If this coding method compresses more efficiently than MPEG-1, this might be a plausible application.