Category Archives: HTML5

Emscripten and Web Audio API

Ha! They said it couldn’t be done! Well, to be fair, I said it couldn’t be done. Or maybe that I just didn’t have any plans to do it. But I did it– I used Emscripten to cross-compile a CPU-intensive C/C++ codebase (Game Music Emu) to JavaScript. Then I leveraged the Web Audio API to output audio and visualize the audio using an HTML5 canvas.

Want to see it in action? Here’s a demonstration. Perhaps I will be able to expand the reach of my Game Music site when I can drop the odd Native Client plugin. This JS-based player works great on Chrome, Firefox, and Safari across desktop operating systems.

But this endeavor was not without its challenges.

Programmatically Generating Audio
First, I needed to figure out the proper method for procedurally generating audio and making it available to output. Generally, there are 2 approaches for audio output:

  1. Sit in a loop and generate audio, writing it out via a blocking audio call
  2. Implement a callback that the audio system can invoke in order to generate more audio when needed

Option #1 is not a good idea for an event-driven language like JavaScript. So I hunted through the rather flexible Web Audio API for a method that allowed something like approach #2. Callbacks are everywhere, after all.

I eventually found what I was looking for with the ScriptProcessorNode. It seems to be intended to apply post-processing effects to audio streams. A program registers a callback which is passed configurable chunks of audio for processing. I subverted this by simply overwriting the input buffers with the audio generated by the Emscripten-compiled library.

The ScriptProcessorNode interface is fairly well documented and works across multiple browsers. However, it is already marked as deprecated:

Note: As of the August 29 2014 Web Audio API spec publication, this feature has been marked as deprecated, and is soon to be replaced by Audio Workers.

Despite being marked as deprecated for 8 months as of this writing, there exists no appreciable amount of documentation for the successor API, these so-called Audio Workers.

Vive la web standards!

Visualize This
The next problem was visualization. The Web Audio API provides the AnalyzerNode API for accessing both time and frequency domain data from a running audio stream (and fetching the data as both unsigned bytes or floating-point numbers, depending on what the application needs). This is a pretty neat idea. I just wish I could make the API work. The simple demos I could find worked well enough. But when I wired up a prototype to fetch and visualize the time-domain wave, all I got were center-point samples (an array of values that were all 128).

Even if the API did work, I’m not sure if it would have been that useful. Per my reading of the AnalyserNode API, it only returns data as a single channel. Why would I want that? My application supports audio with 2 channels. I want 2 channels of data for visualization.

How To Synchronize
So I rolled my own visualization solution by maintaining a circular buffer of audio when samples were being generated. Then, requestAnimationFrame() provided the rendering callbacks. The next problem was audio-visual sync. But that certainly is not unique to this situation– maintaining proper A/V sync is a perennial puzzle in real-time multimedia programming. I was able to glean enough timing information from the environment to achieve reasonable A/V sync (verify for yourself).

Pause/Resume
The next problem I encountered with the Web Audio API was pause/resume facilities, or the lack thereof. For all its bells and whistles, the API’s omission of such facilities seems most unusual, as if the design philosophy was, “Once the user starts playing audio, they will never, ever have cause to pause the audio.”

Then again, I must understand that mine is not a use case that the design committee considered and I’m subverting the API in ways the designers didn’t intend. Typical use cases for this API seem to include such workloads as:

  • Downloading, decoding, and playing back a compressed audio stream via the network, applying effects, and visualizing the result
  • Accessing microphone input, applying effects, visualizing, encoding and sending the data across the network
  • Firing sound effects in a gaming application
  • MIDI playback via JavaScript (this honestly amazes me)

What they did not seem to have in mind was what I am trying to do– synthesize audio in real time.

I implemented pause/resume in a sub-par manner: pausing has the effect of generating 0 values when the ScriptProcessorNode callback is invoked, while also canceling any animation callbacks. Thus, audio output is technically still occurring, it’s just that the audio is pure silence. It’s not a great solution because CPU is still being used.

Future Work
I have a lot more player libraries to port to this new system. But I think I have a good framework set up.

The First Problem

A few years ago, The Linux Hater made the following poignant observation regarding Linux driver support:

Drivers are only just the beginning… But for some reason y’all like to focus on the drivers. You know why lusers do that? Because it just happens to be the problem that people notice first.

And so it is with the HTML5 video codec debate, re-invigorated in the past week by Google’s announcement of dropping native H.264 support in their own HTML5 video tag implementation. As I read up on the fiery debate, I kept wondering why people are so obsessed with this issue. Then I remembered the Linux Hater’s post and realized that the video codec issue is simply the first problem that most people notice regarding HTML5 video.

I appreciate that the video codec debate has prompted Niedermayer to post on his blog once more. Otherwise, I’m just munching popcorn on the sidelines, amused and mildly relieved that the various factions are vociferously attacking each other rather than that little project I help with at work.

Getting back to the “first problem” aspect– there’s so much emphasis on the video codec; I wonder why no one ever, ever mentions word one about an audio codec. AAC is typically the codec that pairs with H.264 in the MPEG stack. Dark Shikari once mentioned that “AAC’s licensing terms are exponentially more onerous than H.264?s. If Google didn’t want to use H.264, they would sure as hell not want to use AAC.” Most people are probably using “H.264” to refer to the entire MPEG/H.264/AAC stack, even if they probably don’t understand what all of those pieces mean.

Anyway, The Linux Hater’s driver piece continues:

Once y’all have drivers, the fight will move to the next layer up. And like I said, it’s a lot harder at that layer.

A few months ago, when I wanted to post the WebM output of my new VP8 encoder and thought it would be a nice touch to deliver it via a video tag, I ignored the video codec problem (just encoded a VP8/WebM file) only to immediately discover a problem at a different layer– specifically, embedding a file using a video tag triggers a full file download when the page is loaded, which is unacceptable from end user and web hosting perspectives. This is a known issue but doesn’t get as much attention, I guess because there are bigger problems to solve first (c.f. video codec issue).

For other issues, check out the YouTube blog’s HTML5 post or Hulu’s post that also commented on HTML5. Issues such as video streaming flexibility, content protection, fullscreen video, webcam/microphone input, and numerous others are rarely mentioned in the debates. Only “video codec” is of paramount importance.

But I’m lending too much weight to the cacophony of a largely uninformed internet debate. Realistically, I know there are many talented engineers down in the trenches working to solve at least some of these problems. To tie this in with the Linux driver example, I’m consistently stunned these days regarding how simple it is to get Linux working on a new computer– most commodity consumer hardware really does just work right out of the box. Maybe one day, we’ll wake up and find that HTML5 video has advanced to the point that it solves all of the relevant problems to make it the simple and obvious choice for delivering web video in nearly all situations.

It won’t be this year.

VP8: The Savior Codec

This past week, the internet picked up — and subsequently sprinted like a cheetah withan unsourced and highly unsubstantiated rumor that Google will open source the VP8 video codec, recently procured through their On2 acquisition. I wager that the FSF is already working on their press release claiming full credit should this actually come to pass. I still retain my “I’ll believe it when I see it” attitude. However, I thought this would be a good opportunity to consolidate all of the public knowledge regarding On2’s VP8 codec.



Pictured: All the proof you need that VP8 is superior to H.264
Update: The preceding comment is meant in sarcastic jest. Read on

The Official VP8 Facts:
Continue reading

HTMLOL5 Video

Last week brought us a lot of news in the web browser space: Mozilla released Firefox 3.6 (nice fullscreen video, BTW, especially on Linux); YouTube and Vimeo grabbed headlines by announcing HTML5 video support for their video sites.

I resolved a few months ago to not bother reading so many tech news sites since they consist of 99% misinformed drivel, and I’m a happier person for that decision. But when there’s big news that can be seen as tangentially related to what I do at my day job, it gets hard to resist.

From everything I read, there was surprisingly little Flash hatred in the wake of these announcements. Really, the situation just erupted into an all-out war between the devotees of Firefox (and to a lesser extent, Opera) and supporters of Google (and to a lesser extent Apple and their Safari browser). It gets boring and repetitive in a hurry when you start reading these discussions since they all go something like this:


HTML5 Video Tag Arguments

As you can see from the infographic, at least both sides can agree on something. I would also like to state my emphatic support for Mozilla’s principled, hardline stance against the MPEG stack for HTML5 video. Please don’t budge on your position. Stand firm on the moral high ground.

That graphic is just the beginning; there are so many problems with HTML5 video that it’s hard to know where to even begin. That’s why I need to remember to just laugh gently at its mention and move along. I only get a headache trying to understand how HTML5 video could ever have the slightest chance of mattering in the grand scheme of things.

However, a pleasant side effect of this attention is that more and more people are actually being exposed to the video tag. One nagging detail people invariably notice is that the video tag performs exceptionally poorly, likely because browsers have to deal with the exact same limitations that the Flash Player does, namely, converting decoded YUV data to RGB so that it can be plopped on a browser page. And if you try to claim that you can just download the media and use a standalone player, you continue to miss the entire point of web video.

Another aspect I have to appreciate about the debate surrounding HTML5 video is the way that it brings out the positive spirit in people. Online discussions are normally overwhelmingly negative. But advocates of the HTML5/Xiph approach truly believe this could all work out: If Apple decides to adopt the Xiph stack, and if some benevolent hardware company would churn out custom ASICs for decoding Xiph codecs, and if those ASICs were adopted in next quarter’s array of mobile computing devices and netbooks, and if Google transcodes their zillobytes of YouTube videos to the Xiph stack, and if Google throws the switch and forces the 60% of IE-using stragglers to either change browsers or go without YouTube, and if Google thereby forgoes many opportunities to monetize their videos, then absolutely! HTML5 video could totally unseat Flash video.

Okay, that’s it for me. I’m going to go back to ignoring the insular, elitist tech world at large except for the few domains in which I have some influence.

See Also: