I have been reading way too many statements from people who confidently assert that Google will open source all of On2’s IP based on no more evidence than… the fact that they really, really hope it happens. Meanwhile, I have found myself pettily hoping it doesn’t happen simply due to the knowledge that the FSF will claim total credit for such a development (don’t believe me? They already claim credit for Apple dropping DRM from music purchases: “Our Defective by Design campaign has a successful history of targeting Apple over its DRM policies… and under the pressure Steve Jobs dropped DRM on music.”)
But for the sake of discussion, let’s run with the idea: Let’s assume that Google open sources any of On2’s intellectual property. Be advised that if you’re the type who believes that all engineering problems large and small can be solved by applying, not thought, but a mystical, nebulous force called “open source”, you can go ahead and skip this post.
The Stack
I like thinking about specific multimedia problems on this blog because people who read this blog often know beans about multimedia technology and are equipped to think about the issues involved. So imagine that Google open sources On2’s video codecs. What does that mean? Free web video for all! Well, not so fast. There are a few issues to consider. While most observers can only see as far as the video codec portion, there are, at the very least, 3 things to consider when putting together a video for consumption:
- Video codec — how are the pictures compressed and encoded?
- Audio codec — presumably, you want some sound to go along with those moving pictures
- Container format — there has to be a method for tying together audio and video for delivery and playback
So, Google open sourcing On2’s video codecs would address the first need. What about the second need? Obviously, MP3 and AAC — which represent the standards today — would just get the open video argument back to square one. “VORBIS!!!” would be the reflexive cry of the open source checklister crowd. Indeed, this just might work. Vorbis has problems and shortcomings but is generally considered a reasonable replacement for MP3, WMA, and AAC, at least for desktop computing applications (we’ll think outside of the desktop arena a little later). Actually, there is another option: Did you know that On2 dabbled in audio codecs at one time? They developed 2 IMA ADPCM variants (DK4 and DK3) which wouldn’t be seriously considered for this application. They also have something they simply called Audio for Video Codec (inspired). I know we have samples of the latter somewhere.
How about the last component on the list, the container format? How about Ogg? Ogg is not usually considered a general-purpose container format. It would require extension to be capable of storing any new codec (I argue that a container is not general-purpose if a demuxer has to have special code to handle every codec that could possibly be inside, which is pretty much how Ogg works). The prevailing container format on the standard MPEG side is the QuickTime-derived MP4 format. This could hold new On2 data (and probably does so in various applications today), but I’m not sure about encoding Vorbis data.
This is the part of the post where I learned something new (which is a big reason I like writing these blog posts). My theory was that there was no agreed-upon method for storing Vorbis audio data in QuickTime/MP4. But if there were something resembling a standard on this front, FFmpeg would implement it. So I tried transcoding an audio file to a Vorbis encoded .MOV or .MP4 and not only did both take, both played back in FFplay.
Mobile Considerations
Okay, so the MP4/VPn/Vorbis stack might just be crazy enough to work… in the desktop realm. There’s the emerging world out there collectively called “mobile” — cell phones, netbooks, tablet computers, and more — and they all want to be able to show video. They often do this with dedicated processors that are designed to decode the standard MPEG codecs (H.264 and AAC). I’ve read from various observers who wave away this problem of decoding a new stack on mobile by invoking 3 magic letters: ‘DSP’. This is a good example of how a little knowledge can be dangerous. There seems to be a general misconception that DSPs (digital signal processors) are extraordinary devices that can be programmed to accelerate the decoding of any video or audio codec. The reality isn’t so simple.
I confess, I don’t know to what extent today’s generation of mobile devices employ general-purpose, programmable DSPs vs. custom MPEG ASICs but that would make for an interesting survey. While I’m confident that any device in the “mobile” category could likely be programmed to play data encoded in the MP4/VPn/Vorbis stack, the issue then becomes battery usage– how long can the custom programmed system play video vs. how long could the same unit play by pushing MPEG data through the custom ASICs? Again, don’t know, but it would be interesting to find some numbers or develop experiments to test this.
The Google/YouTube Factor
Observers obsess about the Google/YouTube angle in all this, which is understandable due to the large mindshare that YouTube controls when it comes to web video. “Control the YouTube video format, control the future of the internet,” or so the popular sentiment seems to go (be advised, however, that YouTube doesn’t necessarily represent as much online video viewing as you might think). One received bit of wisdom is that Google already stores multiple copies of every video in various formats, so one more shouldn’t be any big burden. Our thoughts are not Google’s thoughts, neither are Google’s ways our ways (I’m sure I read that somewhere). It’s hard to know exactly how YouTube operates, but we can always poke at it to find clues. Outside of Google, few people are as interested in YouTube’s operation as Dark Shikari of x264 fame and here are some interesting tidbits he has empirically determined. Of interest to this discussion is the following: “They could have saved 30-50% on bandwidth by offering an H.264 High stream for PC viewers, but they didn’t because it would have required they keep around a separate stream.” This indicates that Google doesn’t necessarily consider disk space to be limitless, at least not in the disk space vs. bandwidth trade-off calculation. I know for a fact that YouTube also keeps around the original copies of all the videos that have ever been uploaded (because certain videos that were not transcoded properly years ago are now correct). However, YouTube only needs to keep one copy of the original around (and maybe a backup somewhere) while transcoded videos need to be shipped off to any number of content distribution nodes.
This doesn’t rule out the possibility that Google could push an MP4/VPn/Vorbis stack as the new web video standard and maintain parallel MPEG & open copies of each YouTube video. But it’s not a foregone conclusion that it would be a drop in the bucket for Google’s resources. As a programmer for a large technology company, I endure more than my fair share of outside advice regarding what my company ought to do for the benefit of everyone else except the company. I can only imagine how frustrating it must be for Google people to constantly read declarations along the lines of, “Google needs to spend untold millions of its own money in order to do what I believe is right,” especially considering that whole “don’t be evil” charter/moral bludgeon.
In Closing
Anyway, I just wanted to think about these issues from a purely technical standpoint, a perspective that I really don’t see discussed anywhere else (except when I read Dark Shikari try to talk sense into online forums where video discussions abound). I don’t have all the facts but I enjoy not only searching for them, but also figuring out how to search for them.
Check out Silvia Pfeiffer’s recent blog post: Google’s challenges of freeing VP8. In addition to wondering about some of the same technical issues I have outlined here, she addresses the tough legal and patent issues surrounding a possible open sourcing. And none of this even begins to address the political and social issues surrounding the adoption of such a standard.
” They often do this with dedicated processors that are designed to decode the standard MPEG codecs (H.264 and AAC).”
H.264 maybe, but not AAC. People don’t design ASICs to decode things that take ~25MHz on ARM cores. Not when you have a 500MHz core that has an idle clock of 50 or 100MHz.
” While I’m confident that any device in the “mobile†category could likely be programmed to play data encoded in the MP4/VPn/Vorbis stack, the issue then becomes battery usage– how long can the custom programmed system play video vs. how long could the same unit play by pushing MPEG data through the custom ASICs?”
Vorbis actually wouldn’t make much of any difference here. Its complexity is comparable to AAC (because its very similar to AAC in general) and its actually a little faster then MP3 on ARM cores. I don’t know about VP6. It’d come down to how much effort someone put into a DSP codec for it, and if any hardware IDCT stuff could be used for it.
Hmm, and I thought you’re going to document VP8 format so some foundation can turn it into Theora 2 lately.
It won’t be bad if they’ve released sources for some obsolete codecs like TrueMotion 2 variants though (they radically differ from TM2).
Heh, seems they already work on the “taking credit” part :)
http://www.fsf.org/blogs/community/google-free-on2-vp8-for-youtube/
oops, nvm. I was thinking you might already be linking it, but somehow it escaped my notice.
@Kostya: I hate to say it, but if they do open up the VP8 source code (or any other unknown On2 IP), I’m probably going to be the first one to document it, just out of force of habit.