Category Archives: General

Managing Music Playback Channels

My Game Music Appreciation site allows users to interact with old video game music by toggling various channels, as long as the underlying synthesizer engine supports it.


5 NES voices

Users often find their way to the Nintendo DS section pretty quickly. This is when they notice an obnoxious quirk with the channel toggling feature: specifically, one channel doesn’t seem to map to a particular instrument or track.

When it comes to computer music playback methodologies, I have long observed that there are 2 general strategies: Fixed channel and dynamic channel allocation.

Fixed Channel Approach
One of my primary sources of computer-based entertainment used to be watching music. Sure I listened to it as well. But for things like Amiga MOD files and related tracker formats, there was a rich ecosystem of fun music playback programs that visualized the music. There exist music visualization modes in various music players these days (such as iTunes and Windows Media Player), but those largely just show you a single wave form. These files were real time syntheses based on multiple audio channels and usually showed some form of analysis for each channel. My personal favorite was Cubic Player:


Open Cubic Player -- oscilloscopes

Most of these players supported the concept of masking individual channels. In doing so, the user could isolate, study, and enjoy different components of the song. For many 4-channel Amiga MOD files, I observed that the common arrangement was to use the 4 channels for beat (percussion track), bass line, chords, and melody. Thus, it was easy to just listen to, e.g., the bass line in isolation.

MODs and similar formats specified precisely which digital audio sample to play at what time and on which specific audio channel. To view the internals of one of these formats, one gets the impression that they contain an extremely computer-centric view of music.

Dynamic Channel Allocation Algorithm
MODs et al. enjoyed a lot of popularity, but the standard for computer music is MIDI. While MOD and friends took a computer-centric view of music, MIDI takes, well, a music-centric view of music.

There are MIDI visualization programs as well. The one that came with my Gravis Ultrasound was called PLAYMIDI.EXE. It looked like this…


Gravis Ultrasound PLAYMIDI.EXE application

… and it confused me. There are 16 distinct channels being visualized but some channels are shown playing multiple notes. When I dug into the technical details, I learned that MIDI just specifies what notes need to be played, at what times and frequencies and using which instrument samples, and it was the MIDI playback program’s job to make it happen.

Thus, if a MIDI file specifies that track 1 should play a C major chord consisting of notes C, E, and G, it would transmit events “key-on C; delta time 0; key-on E; delta time 0; key-on G; delta time …; [other commands]”. If the playback program has access to multiple channels (say, up to 32, in the case of the GUS), the intuitive approach would be to maintain a pool of all available channels. Then, when it’s time to process the “key-on C” event, fetch the first available channel from the pool, mark it as in-use, play C on the channel, and return that channel to the pool when either the sample runs its course or the corresponding “key-off C” event is encountered in the MIDI command stream.

About That Game Music
Circling back around to my game music website, numerous supported systems use the fixed channel approach for playback while others use dynamic channel allocation approach, including evey Nintendo DS game I have so far analyzed.

Which approach is better? As in many technical matters, there are trade-offs either way. For many systems, the fixed channel approach is necessary because for many older audio synthesis systems, different channels had very specific purposes. The 8-bit NES had 5 channels: 2 square wave generators (used musically for melody/treble), 1 triangle wave generator (usually used for bass line), a noise generator (subverted for all manner of percussive sounds), and a limited digital channel (was sometimes assigned richer percussive sounds). Dynamic channel allocation wouldn’t work here.

But the dynamic approach works great on hardware with 16 digital channels available like, for example, the Nintendo DS. Digital channels are very general-purpose. What about the SNES, with its 8 digital channels? Either approach could work. In practice, most games used a fixed channel approach: Games might use 4-6 channels for music while reserving the remainder for various in-game sound effects. Some notable exceptions to this pattern were David Wise’s compositions for Rare’s SNES games (think Battletoads and the various Donkey Kong Country titles). These clearly use some dynamic channel approach since masking all but one channel will give you a variety of instrument sounds.

Epilogue
There! That took a long time to explain but I find it fascinating for some reason. I need to distill it down to far fewer words because I want to make it a FAQ on my website for “Why can’t I isolate specific tracks for Nintendo DS games?”

Actually, perhaps I should remove the ability to toggle Nintendo DS channels in the first place. Here’s a funny tale of needless work: I found the Vio2sf engine for synthesizing Nintendo DS music and incorporated it into the program. It didn’t support toggling of individual channels so I figured out a way to add that feature to the engine. And then I noticed that most Nintendo DS games render that feature moot. After I released the webapp, I learned that I was out of date on the Vio2sf engine. The final insult was that the latest version already supports channel toggling. So I did the work for nothing. But then again, since I want to remove that feature from the UI, doubly so.

Making Sure The PNG Gets There

Rewind to 1999. I was developing an HTTP-based remote management interface for an embedded device. The device sat on an ethernet LAN and you could point a web browser at it. The pitch was to transmit an image of the device’s touch screen and the user could click on the picture to interact with the device. So we needed an image format. If you were computing at the time, you know that the web was insufferably limited back then. Our choice basically came down to GIF and JPEG. Being the office’s annoying free software zealot, I was championing a little known up and coming format named PNG.

So the challenge was to create our own PNG encoder (incorporating a library like libpng wasn’t an option for this platform). I seem to remember being annoyed at having to implement an integrity check (CRC) for the PNG encoder. It’s part of the PNG spec, after all. It just seemed so redundant. At the time, I reasoned that there were 5 layers of integrity validation in play.

I don’t know why, but I was reflecting on this episode recently and decided to revisit it. Here are all the encapsulation layers of a PNG file when flung over an ethernet network:


PNG Network Encapsulation

So there are up to 5 encapsulations for the data in this situation. At the innermost level is the image data which is compressed with the zlib DEFLATE method. At first, I thought that this also had a CRC or checksum. However, in researching this post, I couldn’t find any evidence of such an integrity check. Further, I don’t think we bothered to compress the PNG data in this project long ago. It was a small image, monochrome, and transferring via LAN, so the encoder could get away with signaling uncompressed data.

The graphical data gets wrapped up in a PNG chunk and all PNG chunks have a CRC. To transmit via the network, it goes into a TCP frame, which also has a checksum. That goes into an IP packet. I previously believed that this represented another integrity check. While an IP frame does have a checksum, the checksum only covers the IP header and not the payload. So that doesn’t really count towards this goal.

Finally, the data gets encapsulated into an ethernet frame which has — you guessed it — a CRC.

I see that other link layer protocols like PPP and wireless ethernet (802.11) also feature frame CRCs. So I guess what I’m saying is that, if you transfer a PNG file over the network, you can be confident that the data will be free of any errors.

How To Play Hardware Accelerated Video on A Mac

I have a friend who was considering purchasing a Mac Mini recently. At the time of this writing, there are 3 desktop models (and 2 more “server” models).


Apple Mac Mini

The cheapest one is a Core i5 2.5 GHz. Then there are 2 Core i7 models: 2.3 GHz and 2.6 GHz. The difference between the latter 2 is US$100. The only appreciable technical difference is the extra 0.3 GHz and the choice came down to those 2.

He asked me which one would be able to play HD video at full frame rate. I found this query puzzling. But then, I have been “in the biz” for a bit too long. Whether or not a computer or device can play a video well depends on a lot of factors.

Hardware Support
First of all, looking at the raw speed of the general-purpose CPU inside of a computer as a gauge of video playback performance is generally misguided in this day and age. In general, we have a video standard (H.264, which I’ll focus on for this post) and many bits of hardware are able to accelerate decoding. So, the question is not whether the CPU can decode the data in real time, but can any other hardware in the device (likely the graphics hardware) handle it? These machines have Intel HD 4000 graphics and, per my reading of the literature, they are capable of accelerating H.264 video decoding.

Great, so the hardware supports accelerated decoding. So it’s a done deal, right? Not quite…

Operating System Support
An application can’t do anything pertaining to hardware without permission from the operating system. So the next question is: Does Mac OS X allow an application to access accelerated video decoding hardware if it’s available? This used to be a contentious matter (notably, Adobe Flash Player was unable to accelerate H.264 playback on Mac in the absence of such an API) but then Apple released an official API detailed in Technical Note TN2267.

So, does this mean that video is magically accelerated? Nope, we’re still not there yet…

Application Support
It’s great that all of these underlying pieces are in place, but if an individual application chooses to decode the video directly on the CPU, it’s all for naught. An application needs to query the facilities and direct data through the API if it wants to leverage the acceleration. Obviously, at this point it becomes a matter of “which application?”

My friend eventually opted to get the pricier of the desktop Mac Mini models and we ran some ad-hoc tests since I was curious how widespread the acceleration support is among Mac multimedia players. Here are some programs I wanted to test, playing 1080p H.264:

  • Apple QuickTime Player
  • VLC
  • YouTube with Flash Player (any browser)
  • YouTube with Safari/HTML5
  • YouTube with Chrome/HTML5
  • YouTube with Firefox/HTML5
  • Netflix

I didn’t take exhaustive notes but my impromptu tests revealed QuickTime Player was, far and away, the most performant player, occupying only around 5% of the CPU according to the Mac OS X System Profiler graph (which is likely largely spent on audio decoding).

VLC consistently required 20-30% CPU, so it’s probably leveraging some acceleration facilities. I think that Flash Player and the various HTML5 elements performed similarly (their multi-process architectures can make such a trivial profiling test difficult).

The outlier was Netflix running in Firefox via Microsoft’s Silverlight plugin. Of course, the inner workings of Netflix’s technology are opaque to outsiders and we don’t even know if it uses H.264. It may very well use Microsoft’s VC-1 which is not a capability provided by the Mac OS X acceleration API (it doesn’t look like the Intel HD 4000 chip can handle it either). I have never seen any data one way or another about how Netflix encodes video. However, I was able to see that Netflix required an enormous amount of CPU muscle on the Mac platform.

Conclusion
The foregoing is a slight simplification of the video playback pipeline. There are some other considerations, most notably how the video is displayed afterwards. To circle back around to the original question: Can the Mac Mini handle full HD video playback? As my friend found, the meager Mac Mini can do an admirable job at playing full HD video without loading down the CPU.

Survey of CD Image Formats

In the course of exploring and analyzing the impressive library of CD images curated at the Internet Archive’s Shareware CD collection, one encounters a wealth of methods for copying a complete CD image onto other media for transport. In researching the formats, I have found that many of them are native to various binary, proprietary CD programs that operate under Windows. Since I have an interest in interpreting these image formats and I would also like to do so outside of Windows, I thought to conduct a survey to determine if enough information exists to write processing tools of my own.

Remember from my Grand Unified Theory of Compact Disc that CDs, from a high enough level of software abstraction, are just strings of 2352-byte sectors broken up into tracks. The difference among various types of CDs comes down to the specific meaning of these 2352 bytes.

Most imaging formats rip these strings of sectors into a giant file and then record some metadata information about the tracks and sectors.

ISO
This is perhaps the most common method for storing CD images. It’s generally only applicable to data CD-ROMs. File images generally end with a .iso extension. This refers to ISO-9660 which is the standard CD filesystem.

Sometimes, disc images ripped from other types of discs (like Xbox/360 or GameCube discs) bear the extension .iso, which is a bit of a misnomer since they aren’t formatted using the ISO-9660 filesystem. But the extension sort of stuck.

BIN / CUE
I see the BIN & CUE file format combination quite frequently. Reportedly, a program named CDRWIN deployed this format first. This format can handle a mixed mode CD (e.g., starts with a data track and is followed by a series of audio tracks), whereas ISO can only handle the data track. The BIN file contains the raw data while the CUE file is a text file that defines how the BIN file is formatted (how many bytes in a sector, how many sectors to each individual track).

CDI
This originates from a program called DiscJuggler. This is extremely prevalent in the Sega Dreamcast hobbyist community for some reason. I studied the raw hex dumps of some sample CDI files but there was no obvious data (mostly 0s). There is an open source utility called cdi2iso which is able to extract an ISO image from a CDI file. The program’s source clued me in that the metadata is actually sitting at the end of the image file. This makes sense when you consider how a ripping program needs to operate– copy tracks, sector by sector, and then do something with the metadata after the fact. Options include: 1) Write metadata at the end of the file (as seen here); 2) write metadata into a separate file (seen in other formats on this list); 3) write the data at the beginning of the file which would require a full rewrite of the entire (usually large) image file (I haven’t seen this yet).

Anyway, I believe I have enough information to write a program that can interpret a CDI file. The reason this format is favored for Dreamcast disc images is likely due to the extreme weirdness of Dreamcast discs (it’s complicated, but eventually fits into my Grand Unified Theory of CDs, if you look at it from a high level).

MDF / MDS
MDF and MDS pairs come from a program called Alcohol 120%. The MDF file has the data while the MDS file contains the metadata. The metadata is in an opaque binary format, though. Thankfully, the Wikipedia page links to a description of the format. That’s another image format down.

CCD / SUB / IMG
The CloneCD Control File is one I just ran across today thanks to a new image posted at the IA Shareware Archive (see Super Duke Volume 2). I haven’t found any definitive documentation on this, but it also doesn’t seen too complicated. The .ccd file is a text file that is pretty self-explanatory. The sample linked above, however, only has a .ccd file and a .sub file. I’m led to believe that the .sub file contains subchannel information while a .img file is supposed to contain the binary data. So this rip might be incomplete (nope, the .img file is on the page, in the sidebar; thanks to Phil in the comments for pointing this out). The .sub file is a bit short compared to the Archive’s description of the disc’s contents (only about 4.6 MB of data) and when I briefly scrolled through, it didn’t look like it contains any real computer data. So it probably is just the disc’s subchannel data (something I glossed over in my Grand Unified Theory).

CSO
I have dealt with the CISO (compressed ISO) format before. It’s basically the same as a .iso file described above except that each individual 2048-byte data sector is compressed using zlib. The format boasts up to 9 compression levels, which shouldn’t be a big surprise since that correlates to zlib’s own compression tiers.

Others
Wikipedia has a category for optical disc image formats. Of course, there are numerous others. However, I haven’t encountered them in the wild for the purpose of broad image distribution.