Category Archives: General

Adding C64 SID Music

I have been working on adding support for SID files — the music format for the Commodore 64 — to the game music website for awhile. I feel a bit out of my element since I’m not that familiar with the C64. But why should I let that slow me down? Allow me to go through the steps I have previously outlined in order to make this happen.



I need to know what picture should represent the system in the search results page. The foregoing picture should be fine, but I’m getting way ahead of myself.

Phase 1 is finding adequate player software. The most venerable contender in this arena is libsidplay, or so I first thought. It turns out that there’s libsidplay (originally hosted at Geocities, apparently, and no longer on the net) and also libsidplay2. Both are kind of old (libsidplay2 was last updated in 2004). I tried to compile libsidplay2 and the C++ didn’t agree with current version of g++.

However, a recent effort named libsidplayfp is carrying on the SID emulation tradition. It works rather well, notwithstanding the fact that compiling the entire library has a habit of apparently hanging the Linux VM where I develop this stuff.

Phase 2 is to develop a testbench app around the playback library. With the help of the libsidplayfp library maintainers, I accomplished this. The testbench app consistently requires about 15% of a single core of a fairly powerful Core i7. So I look forward to recommendations that I port that playback library to pure JavaScript.

Phase 3 is plug into the web player. I haven’t worked on this yet. I’m confident that this will work since phase 2 worked (plus, I have a plan to combine phases 2 and 3).

One interesting issue that has arisen is that proper operation of libsidplayfp requires that 3 C64 ROM files be present (the, ahem, KERNAL, BASIC interpreter, and character generator). While these are copyrighted ROMs, they are easily obtainable on the internet. The goal of my project is to eliminate as much friction as possible for enjoying these old tunes. To that end, I will just bake the ROM files directly into the player.

Phase 4 is collecting a SID song corpus. This is the simplest part of the whole process thanks to the remarkable curation efforts of the High Voltage SID Collection (HVSC). Anyone can download a giant archive of every known SID file. So that’s a done deal.

Or is it? One small issue is that I was hoping that the first iteration of my game music website would focus on, well, game music. There is a lot of music in the HVSC that are original compositions or come from demos. The way that the archive is organized makes it difficult to automatically discern whether a particular SID file comes from a game or not.

Phase 5 is munging the metadata. The good news here is that the files have the metadata built in. The not-so-great news is that there isn’t quite as much as I might like. Each file is tagged with title, author, and publisher/copyright. If there is more than one song in a file, they all have the same metadata. Fortunately, if I can import them all into my game music database, there is an opportunity to add a lot more metadata.

Further, there is no play length metadata for these files. This means I will need to set each to a default length like 2 minutes and do something like I did before in order to automatically determine if any songs terminate sooner.

Oddly, the issue I’m most concerned about is character encoding. This is the first project for which I’m making certain that I understand character encoding since I can’t reasonably get away with assuming that everything is ASCII. So far, based on the random sampling of SID files I have checked, there is a good chance of encountering metadata strings with characters that are not in the lower ASCII set. From what I have observed, these characters map to Unicode code points. So I finally get to learn about manipulating strings in such a way that it preserves the character encoding. At the very least, I need Python to rip the strings out of the binary SID files and make sure the Unicode remains intact while being inserted into an SQLite3 database.

Trouble with CoCCA Registry

I’ve been rather despondent all week. People who see me daily could readily identify this fact. Unfortunately, the exact reason was difficult to adequately explain. The problems that nerds deal with…

When A Domain Expires
As a few people noticed, the multimedia.cx domain and all of it’s subdomains didn’t work this last week. The problem started on Monday, October 1. Whose fault? Well, fundamentally, I neglected to renew the domain name in time. However, I prefer to place the blame on the .cx domain registrar, CoCCA Registry. You see, they have never developed the technology to email a domain holder with a notice that their domain is about to expire or has already expired.

This domain is the only one I have ever held so I don’t have a lot of experience in this matter. I wondered if I was crazy for thinking it would be normal for a registrar to send an email or 2 with status updates about your domain. I get the impression from speaking with others that this is indeed normal. I have 3 different email addresses listed under my account at the registrar– 2 at multimedia.cx and a backup gmail account. I checked spam folders after this incident. Then I remembered that I have never received any email notifications from them (although password reset emails show up, so that part thankfully works). Also, their support emails are black holes.

So, I guess the moral is: be wary of dealing with CoCCA Registry. However, they seem to be the only way to register domains under a wide variety of uncommon country codes.

By Friday, the domain appeared to have been reinstated, even through the status was officially listed as “renewal-pending” according to the web-based management console. Eventually, as cached DNS results started to time out throughout the day, I started seeing subdomains come back. I excitedly used the ‘dig’ command to count down the seconds until gamemusic.multimedia.cx was accessible on the network I was on (the number after the domain name is the time-to-live or ‘TTL’ value):

$ dig +nocmd gamemusic.multimedia.cx +noall +answer
gamemusic.multimedia.cx. 3      IN      A       174.143.152.251
$ dig +nocmd gamemusic.multimedia.cx +noall +answer
gamemusic.multimedia.cx. 2      IN      A       174.143.152.251
$ dig +nocmd gamemusic.multimedia.cx +noall +answer
gamemusic.multimedia.cx. 1      IN      A       174.143.152.251
$ dig +nocmd gamemusic.multimedia.cx +noall +answer
gamemusic.multimedia.cx. 12962  IN      A       207.45.186.114

Finally, today (Saturday), I received a receipt confirming that the domain has been renewed.

8 Years Old
Incidentally, happy eighth birthday to multimedia.cx. It was September, 2004 when I decided to branch out from a simple ISP-based web presence.

People often ask why I went with the .cx TLD. When I decided I wanted a proper domain name 8 years ago, I found that multimedia.X was already taken for just about every TLD value of X. .cx was a notable exception and was distinctive enough (speaking of .X, though, I see that multimedia.xxx is still up for grabs as of this writing; I imagine that would come with a whole other set of problems).

It’s funny that tech nerds often rail against outsourcing too much — email, storage, computing power, web hosting — all to some type of cloud provider under the premise that it could easily be taken away. But this episode teaches me that even having your own domain name is no guarantee of a solid online presence.

Meanwhile, I have taken proactive steps to avert this same situation from arising again:



Barring a lack of automated emails from the registrar, I hope a Google Calendar reminder set up a month ahead of expiration will do the trick.

Chiptune Database and API

So I set out to create a website that allows people to easily listen to video game music directly through their web browser. I succeeded in that goal. However, I must admit that the project has limited appeal since the web player is delivered via Chrome’s Native Client technology, somewhat limiting its audience. I’m not certain if anyone really expects NaCl to take off in any serious way, but I still have a few other projects in mind.

I recently realized that, as a side effect of this project, I accidentally created something of significant value to fans of old video games and associated music– a searchable database of chiptune music and metadata. To my knowledge, no one else has endeavored to create such a thing. I figured that I might as well make the database easily accessible with an API and see where it leads.

To that end, I created 2 API entry points. First, there is the search API located at http://gamemusic.multimedia.cx/api/search/. This can be exercised by ending the URL with a URL-encoded search string, e.g.: http://gamemusic.multimedia.cx/api/search/super+mario. This returns JSON data containing an array of results in decreasing order of relevance. Each result has a game title, database ID, media URL, system type, and an SHA-1 hash. This is the same API that the site’s own search page uses.

The database ID can be plugged into http://gamemusic.multimedia.cx/api/metadata/ to retrieve the song’s metadata in JSON format. E.g., the ID for Super Mario Bros. 3 on the NES is 161: http://gamemusic.multimedia.cx/api/metadata/161.

I recently read an article about sins against true RESTful API principles which led me to believe I’m almost certainly doing this web API stuff wrong. I don’t think it’s a huge deal, though, since I don’t think anyone actually listens to chiptunes any more. But if there are offline chiptune music players that are still in service and actively maintained, perhaps the authors would like to implement this API. It would require some type of HTTP networking library, a JSON parser, the embedded XZ decoder, and some new code to parse through my .gamemusic and .psfarchive formats.

This database could be a significant value-add to chiptune playback software, and could help people experience classic game music much more easily.

Adjusting The Timetable and SQL Shame

My Game Music Appreciation website has a big problem that many visitors quickly notice and comment upon. The problem looks like this:



The problem is that all of these songs are 2m30s in length. During the initial import process, unless a chiptune file already had curated length metadata attached, my metadata utility emitted a default play length of 150 seconds. This is not good if you want to listen to all the songs in a soundtrack without interacting with the player page, but have various short songs (think “game over” or other quick jingles) that are over in a few seconds. Such songs still pad out 150 seconds of silence.

So I needed to correct this. Possible solutions:

  1. Manually: At first, I figured I could ask the database which songs needed fixing and listen to them to determine the proper lengths. Then I realized that there were well over 1400 games affected by this problem. This just screams “automated solution”.
  2. Automatically: Ask the database which songs need fixing and then somehow ask the computer to listen to the songs and decide their proper lengths. This sounds like a winner, provided that I can figure out how to programmatically determine if a song has “finished”.

SQL Shame
This play adjustment task has been on my plate for a long time. A key factor that has blocked me is that I couldn’t figure out a single SQL query to feed to the SQLite database underlying the site which would give me all the songs I needed. To be clear, it was very simple and obvious to me how to write a program that would query the database in phases to get all the information. However, I felt that it would be impure to proceed with the task unless I could figure out one giant query to get all the information.

This always seems to come up whenever I start interacting with a database in any serious way. I call it SQL shame. This task got some traction when I got over this nagging doubt and told myself that there’s nothing wrong with the multi-step query program if it solves the problem at hand.

Suddenly, I had a flash of inspiration about why the so-called NoSQL movement exists. Maybe there are a lot more people who don’t like trying to derive such long queries and are happy to allow other languages to pick up the slack.

Estimating Lengths
Anyway, my solution involved writing a Python script to iterate through all the games whose metadata was output by a certain engine (the one that makes the default play length 150 seconds). For each of those games, the script queries the song table and determines if each song is exactly 150 seconds. If it is, then go to work trying to estimate the true length.

The forgoing paragraph describes what I figured was possible with only a single (possibly large) SQL query.

For each song represented in the chiptune file, I ran it through a custom length estimator program. My brilliant (err, naïve) solution to the length estimation problem was to synthesize seconds of audio up to a maximum of 120 seconds (tightening up the default length just a bit) and counting how many of those seconds had all 0 samples. If the count reached 5 consecutive seconds of silence, then the estimator rewound the running length by 5 seconds and declared that to be the proper length. Update the database.

There were about 1430 chiptune files whose songs needed updates. Some files had 1 single song. Some files had over 100. When I let the script run, it took nearly 65 minutes to process all the files. That was a single-threaded solution, of course. Even though I already had the data I needed, I wanted to try to hand at parallelizing the script. So I went to work with Python’s multiprocessing module and quickly refactored it to use all 4 CPU threads on the machine where the files live. Results:

  • Single-threaded solution: 64m42s to process corpus (22 games/minute)
  • Multi-threaded solution: 18m48s with 4 CPU threads (75 games/minute)

More than a 3x speedup across 4 CPU threads, which is decent for a primarily CPU-bound operation.

Epilogue
I suspect that this task will require some refinement or manual intervention. Maybe there are songs which actually have more than 5 legitimate seconds of silence. Also, I entertained the possibility that some songs would generate very low amplitude noise rather than being perfectly silent. In that case, I could refine the script to stipulate that amplitudes below a certain threshold count as 0. Fortunately, I marked which games were modified by this method, so I can run a new script as necessary.

SQL Schema
Here is the schema of my SQlite3 database, for those who want to try their hand at a proper query. I am confident that it’s possible; I just didn’t have the patience to work it out. The task is to retrieve all the rows from the games table where all of the corresponding songs in the songs table is 150000 milliseconds.
Continue reading