Author Archives: Multimedia Mike

Game Music Appreciation, One Year Later

I released my game music website last year about this time. It was a good start and had potential to grow in a lot of directions. But I’m a bit disappointed that I haven’t evolved it as quickly as I would like to. I have made a few improvements, like adjusting the play lengths of many metadata-less songs and revising the original atrocious design of the website using something called Twitter Bootstrap (and, wow, once you know what Bootstrap is, you start noticing it everywhere on the modern web). However, here are a few of the challenges that have slowed me down over the year:

Problems With Native Client – Build System
The technology which enables this project — Google’s Native Client (NaCl) — can be troublesome. One of my key frustrations with the environment is that every single revision of the NaCl SDK seems to adopt a completely new build system layout. If you want to port your NaCl project forward to newer revisions, you have to spend time wrapping your head around whatever the favored build system is. When I first investigated NaCl, I think it was using vanilla GNU Make. Then it switched to SCons. Then I forgot about NaCl for about a year and when I came back, the SDK had reverted back to GNU Make. While that has been consistent, the layout of the SDK sometimes changes and a different example Makefile shows the way.

The very latest version of the API has required me to really overhaul the Makefile and to truly understand the zen of Makefile programming. I’m even starting to grasp the relationship it has to functional programming.

Problems With Native Client – API Versions and Chrome Bugs
I built the original Salty Game Music Player when NaCl API version 16 was current. By the time I published the v16 version, v19 was available. I made the effort to port forward (a few APIs had superfically changed, nothing too dramatic). However, when I would experiment with this new player, I would see intermittent problems on my Windows 7 desktop. Because of this, I was hesitant to make a new player release.

Around the end of May, I started getting bug reports from site users that their Chrome browsers weren’t allowing them to activate the Salty Game Music Player — the upshot was that they couldn’t play music unless they manually flipped a setting in their browser configuration. It turns out that Chrome 27 introduced a bug that caused this problem. Not only that, but my player was one of only 2 known NaCl apps that used the problematic feature (the other was developed by the Google engineer who entered the bug).

After feeling negligent for a long while about not doing anything to fix the bug, I made a concerted and creative effort to work around the bug and pushed out a new version of the player (based on API v25). My effort didn’t work and I had to roll it back somewhat (but still using the new player binaries). The bug was something that I couldn’t work around. However, at about the same time that I was attempting to do this, Google was rolling out Chrome 28 which fixed the bug, rendering my worry and effort moot.

Problems With Native Client – Still Not In The Clear
I felt reasonably secure about releasing the updated player since I couldn’t make my aforementioned problem occur on my Windows 7 setup anymore. I actually have a written test plan for this player, believe it or not. However, I quickly started receiving new bug reports from Windows users. Mostly, these are Windows 8 users. The player basically doesn’t work at all for them now. One user reports the problem on Windows 7 (and another on Windows 2008 Server, I think). But I can’t see it.

I have a theory about what might be going wrong, but of course I’ll need to test it, and determine how to fix it.

Database Difficulties
The player is only half of the site; the other half is the organization of music files. Working on this project has repeatedly reminded me of my fundamental lack of skill concerning databases. I have a ‘production’ database– now I’m afraid to do anything with it for fear of messing it up. It’s an an SQLite3 database, so it’s easy to make backups and to create a copy in order to test and debug a new script. Still, I feel like I’m missing an entire career path worth of database best practices.

There is also the matter of ongoing database maintenance. There are graphical frontends for SQLite3 which make casual updates easier and obviate the need for anything more sophisticated (like a custom web app). However, I have a slightly more complicated database entry task that I fear will require, well, a custom web app in order to smoothly process hundreds, if not thousands of new song files (which have quirks which prohibit the easy mass processing I have been able to get away with so far).

Going Forward
I remain hopeful that I’ll gradually overcome these difficulties. I still love this project and I have received nothing but positive feedback over the past year (modulo the assorted recommendations that I port the entire player to pure JavaScript).

You would think I would learn a lesson about building anything on top of a Google platform in the future, especially Native Client. Despite all this, I have another NaCl project planned.

Managing Music Playback Channels

My Game Music Appreciation site allows users to interact with old video game music by toggling various channels, as long as the underlying synthesizer engine supports it.


5 NES voices

Users often find their way to the Nintendo DS section pretty quickly. This is when they notice an obnoxious quirk with the channel toggling feature: specifically, one channel doesn’t seem to map to a particular instrument or track.

When it comes to computer music playback methodologies, I have long observed that there are 2 general strategies: Fixed channel and dynamic channel allocation.

Fixed Channel Approach
One of my primary sources of computer-based entertainment used to be watching music. Sure I listened to it as well. But for things like Amiga MOD files and related tracker formats, there was a rich ecosystem of fun music playback programs that visualized the music. There exist music visualization modes in various music players these days (such as iTunes and Windows Media Player), but those largely just show you a single wave form. These files were real time syntheses based on multiple audio channels and usually showed some form of analysis for each channel. My personal favorite was Cubic Player:


Open Cubic Player -- oscilloscopes

Most of these players supported the concept of masking individual channels. In doing so, the user could isolate, study, and enjoy different components of the song. For many 4-channel Amiga MOD files, I observed that the common arrangement was to use the 4 channels for beat (percussion track), bass line, chords, and melody. Thus, it was easy to just listen to, e.g., the bass line in isolation.

MODs and similar formats specified precisely which digital audio sample to play at what time and on which specific audio channel. To view the internals of one of these formats, one gets the impression that they contain an extremely computer-centric view of music.

Dynamic Channel Allocation Algorithm
MODs et al. enjoyed a lot of popularity, but the standard for computer music is MIDI. While MOD and friends took a computer-centric view of music, MIDI takes, well, a music-centric view of music.

There are MIDI visualization programs as well. The one that came with my Gravis Ultrasound was called PLAYMIDI.EXE. It looked like this…


Gravis Ultrasound PLAYMIDI.EXE application

… and it confused me. There are 16 distinct channels being visualized but some channels are shown playing multiple notes. When I dug into the technical details, I learned that MIDI just specifies what notes need to be played, at what times and frequencies and using which instrument samples, and it was the MIDI playback program’s job to make it happen.

Thus, if a MIDI file specifies that track 1 should play a C major chord consisting of notes C, E, and G, it would transmit events “key-on C; delta time 0; key-on E; delta time 0; key-on G; delta time …; [other commands]”. If the playback program has access to multiple channels (say, up to 32, in the case of the GUS), the intuitive approach would be to maintain a pool of all available channels. Then, when it’s time to process the “key-on C” event, fetch the first available channel from the pool, mark it as in-use, play C on the channel, and return that channel to the pool when either the sample runs its course or the corresponding “key-off C” event is encountered in the MIDI command stream.

About That Game Music
Circling back around to my game music website, numerous supported systems use the fixed channel approach for playback while others use dynamic channel allocation approach, including evey Nintendo DS game I have so far analyzed.

Which approach is better? As in many technical matters, there are trade-offs either way. For many systems, the fixed channel approach is necessary because for many older audio synthesis systems, different channels had very specific purposes. The 8-bit NES had 5 channels: 2 square wave generators (used musically for melody/treble), 1 triangle wave generator (usually used for bass line), a noise generator (subverted for all manner of percussive sounds), and a limited digital channel (was sometimes assigned richer percussive sounds). Dynamic channel allocation wouldn’t work here.

But the dynamic approach works great on hardware with 16 digital channels available like, for example, the Nintendo DS. Digital channels are very general-purpose. What about the SNES, with its 8 digital channels? Either approach could work. In practice, most games used a fixed channel approach: Games might use 4-6 channels for music while reserving the remainder for various in-game sound effects. Some notable exceptions to this pattern were David Wise’s compositions for Rare’s SNES games (think Battletoads and the various Donkey Kong Country titles). These clearly use some dynamic channel approach since masking all but one channel will give you a variety of instrument sounds.

Epilogue
There! That took a long time to explain but I find it fascinating for some reason. I need to distill it down to far fewer words because I want to make it a FAQ on my website for “Why can’t I isolate specific tracks for Nintendo DS games?”

Actually, perhaps I should remove the ability to toggle Nintendo DS channels in the first place. Here’s a funny tale of needless work: I found the Vio2sf engine for synthesizing Nintendo DS music and incorporated it into the program. It didn’t support toggling of individual channels so I figured out a way to add that feature to the engine. And then I noticed that most Nintendo DS games render that feature moot. After I released the webapp, I learned that I was out of date on the Vio2sf engine. The final insult was that the latest version already supports channel toggling. So I did the work for nothing. But then again, since I want to remove that feature from the UI, doubly so.

Developing A Shader-Based Video Codec

Early last month, this thing called ORBX.js was in the news. It ostensibly has something to do with streaming video and codec technology, which naturally catches my interest. The hype was kicked off by Mozilla honcho Brendan Eich when he posted an article asserting that HD video decoding could be entirely performed in JavaScript. We’ve seen this kind of thing before using Broadway– an H.264 decoder implemented entirely in JS. But that exposes some very obvious limitations (notably CPU usage).

But this new video codec promises 1080p HD playback directly in JavaScript which is a lofty claim. How could it possibly do this? I got the impression that performance was achieved using WebGL, an extension which allows JavaScript access to accelerated 3D graphics hardware. Browsing through the conversations surrounding the ORBX.js announcement, I found this confirmation from Eich himself:

You’re right that WebGL does heavy lifting.

As of this writing, ORBX.js remains some kind of private tech demo. If there were a public demo available, it would necessarily be easy to reverse engineer the downloadable JavaScript decoder.

But the announcement was enough to make me wonder how it could be possible to create a video codec which effectively leverages 3D hardware.

Prior Art
In theorizing about this, it continually occurs to me that I can’t possibly be the first person to attempt to do this (or the ORBX.js people, for that matter). In googling on the matter, I found various forums and Q&A posts where people asked if it were possible to, e.g., accelerate JPEG decoding and presentation using 3D hardware, with no answers. I also found a blog post which describes a plan to use 3D hardware to accelerate VP8 video decoding. It was a project done under the banner of Google’s Summer of Code in 2011, though I’m not sure which open source group mentored the effort. The project did not end up producing the shader-based VP8 codec originally chartered but mentions that “The ‘client side’ of the VP8 VDPAU implementation is working and is currently being reviewed by the libvdpau maintainers.” I’m not sure what that means. Perhaps it includes modifications to the public API that supports VP8, but is waiting for the underlying hardware to actually implement VP8 decoding blocks in hardware.

What’s So Hard About This?
Video decoding is a computationally intensive task. GPUs are known to be really awesome at chewing through computationally intensive tasks. So why aren’t GPUs a natural fit for decoding video codecs?

Generally, it boils down to parallelism, or lack of opportunities thereof. GPUs are really good at doing the exact same operations over lots of data at once. The problem is that decoding compressed video usually requires multiple phases that cannot be parallelized, and the individual phases often cannot be parallelized. In strictly mathematical terms, a compressed data stream will need to be decoded by applying a function f(x) over each data element, x0 .. xn. However, the function relies on having applied the function to the previous data element, i.e.:

f(xn) = f(f(xn-1))

What happens when you try to parallelize such an algorithm? Temporal rifts in the space/time continuum, if you’re in a Star Trek episode. If you’re in the real world, you’ll get incorrect, unusuable data as the parallel computation is seeded with a bunch of invalid data at multiple points (which is illustrated in some of the pictures in the aforementioned blog post about accelerated VP8).

Example: JPEG
Let’s take a very general look at the various stages involved in decoding the ubiquitous JPEG format:


High level JPEG decoding flow

What are the opportunities to parallelize these various phases?
Continue reading

Making Sure The PNG Gets There

Rewind to 1999. I was developing an HTTP-based remote management interface for an embedded device. The device sat on an ethernet LAN and you could point a web browser at it. The pitch was to transmit an image of the device’s touch screen and the user could click on the picture to interact with the device. So we needed an image format. If you were computing at the time, you know that the web was insufferably limited back then. Our choice basically came down to GIF and JPEG. Being the office’s annoying free software zealot, I was championing a little known up and coming format named PNG.

So the challenge was to create our own PNG encoder (incorporating a library like libpng wasn’t an option for this platform). I seem to remember being annoyed at having to implement an integrity check (CRC) for the PNG encoder. It’s part of the PNG spec, after all. It just seemed so redundant. At the time, I reasoned that there were 5 layers of integrity validation in play.

I don’t know why, but I was reflecting on this episode recently and decided to revisit it. Here are all the encapsulation layers of a PNG file when flung over an ethernet network:


PNG Network Encapsulation

So there are up to 5 encapsulations for the data in this situation. At the innermost level is the image data which is compressed with the zlib DEFLATE method. At first, I thought that this also had a CRC or checksum. However, in researching this post, I couldn’t find any evidence of such an integrity check. Further, I don’t think we bothered to compress the PNG data in this project long ago. It was a small image, monochrome, and transferring via LAN, so the encoder could get away with signaling uncompressed data.

The graphical data gets wrapped up in a PNG chunk and all PNG chunks have a CRC. To transmit via the network, it goes into a TCP frame, which also has a checksum. That goes into an IP packet. I previously believed that this represented another integrity check. While an IP frame does have a checksum, the checksum only covers the IP header and not the payload. So that doesn’t really count towards this goal.

Finally, the data gets encapsulated into an ethernet frame which has — you guessed it — a CRC.

I see that other link layer protocols like PPP and wireless ethernet (802.11) also feature frame CRCs. So I guess what I’m saying is that, if you transfer a PNG file over the network, you can be confident that the data will be free of any errors.