Category Archives: General

Naive Sorenson Video 1 Encoder

(Yes, the word is “naive” — or rather, “naïve” — not “native”. People always try to correct me when I use the word. Indeed, it should actually be written with 2 dots over the ‘i’ but who has a keyboard that can easily do that?)

At the most primitive level, programming a video encoder is about writing out a sequence of bits that the corresponding video decoder will understand. It’s sort of like creating a program — represented as a stream of opcodes — that will run on a given microprocessor or virtual machine. In fact, reading a video codec bitstream specification will reveal a lot of terminology along the lines of “transmitting information to the decoder” or “signaling the decoder to do xyz.”

Creating a good encoder that will deliver decent quality at a reasonable bitrate is difficult. Creating a naive encoder that produces a technically compliant bitstream, not so much.



When I wrote an FFmpeg encoder for Sorenson Video 1 (SVQ1), the first step was to just create a minimally compliant bitstream. The coarsest encoding mode that SVQ1 allows is to encode the average (mean) of each 16×16 block of samples. So I created an encoder that just encoded the mean of each block. Apple’s QuickTime Player was able to play the resulting video in all of its blocky glory. The result rather reminds me of the Super Nintendo’s mosaic effect.

Level 5 blocks (mean-only 16×16 encoding):



Level 3 blocks (mean-only 8×8 encoding):



It’s one thing for your own decoder (in this case, FFmpeg’s own decoder) to be able to decode the data. The big test is whether the official decoder (in this case, Apple QuickTime Player) can decode the file.



Now that’s a good feeling. After establishing that sort of baseline, it’s possible to adapt more and more features of the codec.

How Much H.264 In Each Encoder?

Thanks to my recent experiments with code coverage tools, I have a powerful new — admittedly somewhat specious — method of comparing programs. For example, I am certain that I have read on more than one occasion that Apple’s H.264 encoder sucks compared to x264 due, at least in part, to the Apple encoder’s alleged inability to exercise all of H.264’s features. I wonder how to test that claim?

Experiment
Use code coverage tools to determine which H.264 encoder uses the most features.

Assumptions

  • Movie trailers hosted by Apple will all be encoded with the same settings using Apple’s encoder.
  • Similarly, Yahoo’s movie trailers will be encoded with consistent settings using an unknown encoder.
  • Encoding a video using FFmpeg’s libx264-slow setting will necessarily throw a bunch of H.264’s features into the mix (I really don’t think this assumption holds much water, but I also don’t know what “standard” x264 settings are).

Methodology

  • Grab a random Apple-hosted 1080p movie trailer and random Yahoo-hosted 1080p movie trailer from Dave’s Trailer Page.
  • Use libx264/FFmpeg with the ‘slow’ preset to encode Big Buck Bunny 1080p from raw PNG files.
  • Build FFmpeg with code coverage enabled.
  • Decode each file to raw YUV, ignore audio decoding, generate code coverage statistics using gcovr, reset stats after each run by deleting *.gcda files.

Results

  • x264 1080p video: 9968 / 134203 lines
  • Apple 1080p trailer: 9968 / 134203 lines
  • Yahoo 1080p trailer: 9914 / 134203 lines

I also ran this old x264-encoded file (ImperishableNightStage6Low.mp4) through the same test. It demonstrated the most code coverage with 10671 / 134203 lines.

Conclusions
Conclusions? Ha! Go ahead and jump all over this test. I’m already fairly confident that it’s impossible (or maybe just very difficult) to build a single H.264-encoded video that exercises every feature that FFmpeg’s decoder supports. For example, is it possible for a file to use both CABAC and CAVLC entropy methods? If it’s possible, does any current encoder do that?

Linux Media Player Survey Circa 2001

Here’s a document I scavenged from my archives. It was dated September 1, 2001 and I now publish it 9 years later. It serves as sort of a time capsule for the state of media player programs at the time. Looking back on this list, I can’t understand why I couldn’t find MPlayer while I was conducting this survey, especially since MPlayer is the project I eventually started to work for a few months after writing this piece.

For a little context, I had been studying multimedia concepts and tech for a year and was itching to get my hands dirty with practical multimedia coding. But I wanted to tackle what I perceived as unsolved problems– like playback of proprietary codecs. I didn’t want to have to build a new media playback framework just to start working on my problems. So I surveyed the players available to see which ones I could plug into and use as a testbed for implementing new decoders.

Regarding Real Player, I wrote: “We’re trying to move away from the proprietary, closed-source ‘solutions'”. Heh. Was I really an insufferable open source idealist back in the day?

Anyway, here’s the text with some Where are they now? commentary [in brackets]:


Towards an All-Inclusive Media Playing Solution for Linux

I don’t feel that the media playing solutions for Linux set their sights high enough, even though they do tend to be quite ambitious. Continue reading

2 GB Should Be Enough For Me

My new EeePC 1201PN netbook has 2 GB of RAM. Call me shortsighted but I feel like “that ought to be enough for me”. I’m not trying to claim that it ought to be enough for everyone. I am, however, questioning the utility of swap space for those skilled in the art of computing.



Technology marches on: This ancient 128 MB RAM module is larger than my digital camera’s battery charger… and I just realized that comparison doesn’t make any sense

Does anyone else have this issue? It has gotten to the point where I deliberately disable swap partitions on Linux desktops I’m using ('swapoff -a'), and try not to allocate a swap partition during install time. I’m encountering Linux installers that seem to be making it tougher to do this, essentially pleading with you to create a swap partition– “Seriously, you might need 8 total gigabytes of virtual memory one day.” I’m of the opinion that if 2 GB of physical memory isn’t enough for my normal operation, I might need to re-examine my processes.

In the course of my normal computer usage (which is definitely not normal by the standard of a normal computer user), swap space is just another way for the software to screw things up behind the scenes. In this case, the mistake is performance-related as the software makes poor decisions about what needs to be kept in RAM.

And then there are the netbook-oriented Linux distributions that insisted upon setting aside as swap 1/2 gigabyte of the already constrained 4 gigabytes of my Eee PC 701’s on-board flash memory, never offering the choice to opt out of swap space during installation. Earmarking flash memory for swap space is generally regarded as exceptionally poor form. To be fair, I don’t know that SSD has been all that prevalent in netbooks since the very earliest units in the netbook epoch.

Am I alone in this? Does anyone else prefer to keep all of their memory physical in this day and age?