Naive Sorenson Video 1 Encoder

(Yes, the word is “naive” — or rather, “naïve” — not “native”. People always try to correct me when I use the word. Indeed, it should actually be written with 2 dots over the ‘i’ but who has a keyboard that can easily do that?)

At the most primitive level, programming a video encoder is about writing out a sequence of bits that the corresponding video decoder will understand. It’s sort of like creating a program — represented as a stream of opcodes — that will run on a given microprocessor or virtual machine. In fact, reading a video codec bitstream specification will reveal a lot of terminology along the lines of “transmitting information to the decoder” or “signaling the decoder to do xyz.”

Creating a good encoder that will deliver decent quality at a reasonable bitrate is difficult. Creating a naive encoder that produces a technically compliant bitstream, not so much.

When I wrote an FFmpeg encoder for Sorenson Video 1 (SVQ1), the first step was to just create a minimally compliant bitstream. The coarsest encoding mode that SVQ1 allows is to encode the average (mean) of each 16×16 block of samples. So I created an encoder that just encoded the mean of each block. Apple’s QuickTime Player was able to play the resulting video in all of its blocky glory. The result rather reminds me of the Super Nintendo’s mosaic effect.

Level 5 blocks (mean-only 16×16 encoding):

Level 3 blocks (mean-only 8×8 encoding):

It’s one thing for your own decoder (in this case, FFmpeg’s own decoder) to be able to decode the data. The big test is whether the official decoder (in this case, Apple QuickTime Player) can decode the file.

Now that’s a good feeling. After establishing that sort of baseline, it’s possible to adapt more and more features of the codec.

5 thoughts on “Naive Sorenson Video 1 Encoder

  1. Aninhumer

    IMHO you don’t need to spell naive with the dots, any more than you need to spell cafe with an accent or facade with a cedilla. We’re not speaking French here, and it’s not like our pronunciation is regular enough to require diacritics to modify it.

    (I hope your comments thread isn’t *all* arguments about a throwaway in your topline… :P)

  2. Anonymous

    I don’t know about other platforms/regions; but on OS X (UK keyboard), Alt+U will produce a combining diacritic ¨. I (^), E (´) and N (~) also seem to produce various other diacritics.

    I can’t say for Linux or Windows; they’re probably different

  3. Reimar

    I haven’t tested that one specifically, but on Linux the us altgr-intl layout should give you access to many special characters without impacting your normal typing (the special i is actually on AltGr+j on that layout).
    Concerning the real subject: That’s a very baseline baseline there :-). But one of the big advantages of video codecs is that it’s easy to recognize _something_ even when you are still very far off.
    At least with my ears, if you make a tiny mistake with audio codecs you get something completely unrecognizable.

  4. Manabu

    Wow, cinepack *_*

    > but who has a keyboard that can easily do that?
    I do. In ABNT-2 keyboards it is localized over the 6. ^.^
    But yeah, I understand your problem.

    Thinking now, this comment ended up less relevant than I thought it would be when I started writing it… bah. ;)

Comments are closed.