April, 2015 | Breaking Eggs And Making Omelettes

Ha! They said it couldn’t be done! Well, to be fair, I said it couldn’t be done. Or maybe that I just didn’t have any plans to do it. But I did it– I used Emscripten to cross-compile a CPU-intensive C/C++ codebase (Game Music Emu) to JavaScript. Then I leveraged the Web Audio API to output audio and visualize the audio using an HTML5 canvas.

Want to see it in action? Here’s a demonstration. Perhaps I will be able to expand the reach of my Game Music site when I can drop the odd Native Client plugin. This JS-based player works great on Chrome, Firefox, and Safari across desktop operating systems.

But this endeavor was not without its challenges.

Programmatically Generating Audio
First, I needed to figure out the proper method for procedurally generating audio and making it available to output. Generally, there are 2 approaches for audio output:

Sit in a loop and generate audio, writing it out via a blocking audio call
Implement a callback that the audio system can invoke in order to generate more audio when needed

Option #1 is not a good idea for an event-driven language like JavaScript. So I hunted through the rather flexible Web Audio API for a method that allowed something like approach #2. Callbacks are everywhere, after all.

I eventually found what I was looking for with the ScriptProcessorNode. It seems to be intended to apply post-processing effects to audio streams. A program registers a callback which is passed configurable chunks of audio for processing. I subverted this by simply overwriting the input buffers with the audio generated by the Emscripten-compiled library.

The ScriptProcessorNode interface is fairly well documented and works across multiple browsers. However, it is already marked as deprecated:

Note: As of the August 29 2014 Web Audio API spec publication, this feature has been marked as deprecated, and is soon to be replaced by Audio Workers.

Despite being marked as deprecated for 8 months as of this writing, there exists no appreciable amount of documentation for the successor API, these so-called Audio Workers.

Vive la web standards!

Visualize This
The next problem was visualization. The Web Audio API provides the AnalyzerNode API for accessing both time and frequency domain data from a running audio stream (and fetching the data as both unsigned bytes or floating-point numbers, depending on what the application needs). This is a pretty neat idea. I just wish I could make the API work. The simple demos I could find worked well enough. But when I wired up a prototype to fetch and visualize the time-domain wave, all I got were center-point samples (an array of values that were all 128).

Even if the API did work, I’m not sure if it would have been that useful. Per my reading of the AnalyserNode API, it only returns data as a single channel. Why would I want that? My application supports audio with 2 channels. I want 2 channels of data for visualization.

How To Synchronize
So I rolled my own visualization solution by maintaining a circular buffer of audio when samples were being generated. Then, requestAnimationFrame() provided the rendering callbacks. The next problem was audio-visual sync. But that certainly is not unique to this situation– maintaining proper A/V sync is a perennial puzzle in real-time multimedia programming. I was able to glean enough timing information from the environment to achieve reasonable A/V sync (verify for yourself).

Pause/Resume
The next problem I encountered with the Web Audio API was pause/resume facilities, or the lack thereof. For all its bells and whistles, the API’s omission of such facilities seems most unusual, as if the design philosophy was, “Once the user starts playing audio, they will never, ever have cause to pause the audio.”

Then again, I must understand that mine is not a use case that the design committee considered and I’m subverting the API in ways the designers didn’t intend. Typical use cases for this API seem to include such workloads as:

Downloading, decoding, and playing back a compressed audio stream via the network, applying effects, and visualizing the result
Accessing microphone input, applying effects, visualizing, encoding and sending the data across the network
Firing sound effects in a gaming application
MIDI playback via JavaScript (this honestly amazes me)

What they did not seem to have in mind was what I am trying to do– synthesize audio in real time.

I implemented pause/resume in a sub-par manner: pausing has the effect of generating 0 values when the ScriptProcessorNode callback is invoked, while also canceling any animation callbacks. Thus, audio output is technically still occurring, it’s just that the audio is pure silence. It’s not a great solution because CPU is still being used.

Future Work
I have a lot more player libraries to port to this new system. But I think I have a good framework set up.

Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering

Monthly Archives: April 2015

Emscripten and Web Audio API