Category Archives: Outlandish Brainstorms

Entertaining bizarre ideas largely related to multimedia hacking and reverse engineering

Developing A Shader-Based Video Codec

Early last month, this thing called ORBX.js was in the news. It ostensibly has something to do with streaming video and codec technology, which naturally catches my interest. The hype was kicked off by Mozilla honcho Brendan Eich when he posted an article asserting that HD video decoding could be entirely performed in JavaScript. We’ve seen this kind of thing before using Broadway– an H.264 decoder implemented entirely in JS. But that exposes some very obvious limitations (notably CPU usage).

But this new video codec promises 1080p HD playback directly in JavaScript which is a lofty claim. How could it possibly do this? I got the impression that performance was achieved using WebGL, an extension which allows JavaScript access to accelerated 3D graphics hardware. Browsing through the conversations surrounding the ORBX.js announcement, I found this confirmation from Eich himself:

You’re right that WebGL does heavy lifting.

As of this writing, ORBX.js remains some kind of private tech demo. If there were a public demo available, it would necessarily be easy to reverse engineer the downloadable JavaScript decoder.

But the announcement was enough to make me wonder how it could be possible to create a video codec which effectively leverages 3D hardware.

Prior Art
In theorizing about this, it continually occurs to me that I can’t possibly be the first person to attempt to do this (or the ORBX.js people, for that matter). In googling on the matter, I found various forums and Q&A posts where people asked if it were possible to, e.g., accelerate JPEG decoding and presentation using 3D hardware, with no answers. I also found a blog post which describes a plan to use 3D hardware to accelerate VP8 video decoding. It was a project done under the banner of Google’s Summer of Code in 2011, though I’m not sure which open source group mentored the effort. The project did not end up producing the shader-based VP8 codec originally chartered but mentions that “The ‘client side’ of the VP8 VDPAU implementation is working and is currently being reviewed by the libvdpau maintainers.” I’m not sure what that means. Perhaps it includes modifications to the public API that supports VP8, but is waiting for the underlying hardware to actually implement VP8 decoding blocks in hardware.

What’s So Hard About This?
Video decoding is a computationally intensive task. GPUs are known to be really awesome at chewing through computationally intensive tasks. So why aren’t GPUs a natural fit for decoding video codecs?

Generally, it boils down to parallelism, or lack of opportunities thereof. GPUs are really good at doing the exact same operations over lots of data at once. The problem is that decoding compressed video usually requires multiple phases that cannot be parallelized, and the individual phases often cannot be parallelized. In strictly mathematical terms, a compressed data stream will need to be decoded by applying a function f(x) over each data element, x0 .. xn. However, the function relies on having applied the function to the previous data element, i.e.:

f(xn) = f(f(xn-1))

What happens when you try to parallelize such an algorithm? Temporal rifts in the space/time continuum, if you’re in a Star Trek episode. If you’re in the real world, you’ll get incorrect, unusuable data as the parallel computation is seeded with a bunch of invalid data at multiple points (which is illustrated in some of the pictures in the aforementioned blog post about accelerated VP8).

Example: JPEG
Let’s take a very general look at the various stages involved in decoding the ubiquitous JPEG format:

High level JPEG decoding flow

What are the opportunities to parallelize these various phases?
Continue reading

Sequencing MIDI From A Chiptune

The feature requests for my game music appreciation website project continue to pour in. Many of them take the form of “please add player support for system XYZ and the chiptune library to go with it.” Most of these requests are A) plausible, and B) in process. I have also received recommendations for UI improvements which I take under consideration. Then there are the numerous requests to port everything from Native Client to JavaScript so that it will work everywhere, even on mobile, a notion which might take a separate post to debunk entirely.

But here’s an interesting request about which I would like to speculate: Automatically convert a chiptune into a MIDI file. I immediately wanted to dismiss it as impossible or highly implausible. But, as is my habit, I started pondering the concept a little more carefully and decided that there’s an outside chance of getting some part of the idea to work.

Intro to MIDI
MIDI stands for Musical Instrument Digital Interface. It’s a standard musical interchange format and allows music instruments and computers to exchange musical information. The file interchange format bears the extension .mid and contains a sequence of numbers that translate into commands separated by time deltas. E.g.: turn key on (this note, this velocity); wait x ticks; turn key off; wait y ticks; etc. I’m vastly oversimplifying, as usual.

MIDI fascinated me back in the days of dialup internet and discrete sound cards (see also my write-up on the Gravis Ultrasound). Typical song-length MIDI files often ranged from a few kilobytes to a few 10s of kilobytes. They were significantly smaller than the MOD et al. family of tracker music formats mostly by virtue of the fact that MIDI files aren’t burdened by transporting digital audio samples.
Continue reading

Announcing the World’s Worst VP8 Encoder

I wanted to see if I could write an extremely basic VP8 encoder. It turned out to be one of the hardest endeavors I have ever attempted (and arguably one of the least successful).

I started with the Big Buck Bunny title image:

And this is the best encoding that this experiment could yield:

Squint hard enough and you can totally make out the logo. Pretty silly effort, I know. It should also be noted that the resultant .webm file holding that single 400×225 image was 191324 bytes. When FFmpeg decoded it to a PNG, it was only 187200 bytes.

The Story
Remember my post about a naive SVQ1 encoder? Long story short, I set out to do the same thing with VP8. (I wanted to do the same thing with VP3/Theora for years. But take a good look at what it would entail to create even the most basic bitstream. As involved as VP8 may be, its bitstream is absolutely trivial compared to VP3/Theora.)
Continue reading

My Own Offline RSS Reader

My current living situation saddles me with a rather lengthy commute. More time to work on my old Asus Eee PC 701 (still can’t think of a reason to get a better netbook). It would be neat if I could read RSS feeds offline using Ubuntu-based Linux on this thing. But with all the Linux software I can find, that’s just not to be. I think the best hope I had was Google Reader in offline mode using Google Gears. But I couldn’t get it installed in Firefox and the Linux version of Gears doesn’t support the Linux version of Chrome. I did a bunch of searching beside and all I could find were forum posts with similar laments: Offline RSS readers don’t allow you to read things offline. Actually, to be fair, I think these offline RSS readers operate exactly as advertised: They allow you to read the RSS feeds offline. The problem is that an RSS feed doesn’t usually contain much meat, just a title, a synopsis, and a link to the main content. What I (and, I suspect, most people) want in an “offline reader” is a program that follows those links, downloads the HTML pages, and downloads any supporting images and stylesheets, all for later browsing.

I didn’t want to have to reinvent this particular wheel, but here goes.

Here’s the pitch: Create a text file with a list of RSS feeds. Create a Python script that retrieves each. Use Python’s XML capabilities (which I have already had success with) to iterate through each item in an RSS feed. For each item, parse the corresponding link. Fetch the link and parse through the HTML. For each CSS, JS, or IMG reference, download that data as well. Compute a hash of that supporting data and replace the link with that hash. Dump that data in a local SQLite database (you knew that was coming). Dump the modified HTML page into that database as well.

Part 2 is to create a Python-based webserver that serves up this data from a localhost address.

One nifty aspect of this idea is that my Eee PC does not have to do the actual RSS updating. If the relevant scripts and the SQLite database are stored on a Flash drive, the updating process can be run on any system with standard Python.

See Also:

  • Part 2, where I get this idea to a minimally functioning state
  • GhettoRSS, what I eventually called the software when I released it