Monthly Archives: February 2010

What’s So Hard About 0xA9?

No matter how much I think I know about about character encoding or trying to work around issues arising from the same, I’ll always get bitten.

For a long time, one particular FATE configuration has shown 87/127 tests succeeding, even though the total number of test specs has crept up over to 300. I investigated this errant configuration on the client side and concluded that it was, in fact, executing all of the tests and sending all the results over the server in one neat package. Apparently, the problem was on the server side. Since it was an older Intel C compiler configuration, I didn’t care about investigating much further.

At one point, some bad bit of code was checked into FFmpeg and all of the results started showing xy/127 tests succeeded. This made the issue a bit more pressing. Mans discovered that the problem had to do with the svq3 test spec failing. The bad code affecting the SVQ3 test was quickly fixed so I didn’t worry about it again until yesterday when, once again, FATE’s various configs were only reporting that 127 tests had been run.

Here’s what was happening: FATE stores the stderr output of a test only if the test spec fails. This is a key data point since everything is fine when the test is successful and FATE tosses the output. The sample used for the SVQ3 test outputs the following metadata (among other data) in the stderr (seen, for example, in this test result):

    copyright       : ? Vertical Online 2001
    copyright-eng   : ? Vertical Online 2001

Those mystery characters map to the byte 0xA9 which is the c-in-a-circle copyright symbol according to UTF tables I can find (at least, U+00A9 is). That byte is making the system choke somewhere along the line, which annoys me greatly. When the client-side Python script executes the test and stuffs the stdout and stderr into the SQLite database, the relevant field is supposed to be a blob– a binary large object. The receiving PHP script on the server is also supposed to honor that blob schema.

Mans’ solution is to specifically encode the stdout/stderr blobs as UTF8 strings in the client-side Python script. That fixes this problem. But I’m confused as to why this is necessary in the first place. Was the PHP script doing its best to interpret the data inside the blob and falling over? Or was the SQLite engine on the server confused by the 0xA9 character in the blob?

Also, I suddenly find myself wondering how the A9 search company got its name.

Update: Thanks for MichaelK. for pointing out the problem. While I was properly converting (since that’s necessary) stdout/stderr from build records to binary type, I never did the same for test result stdout/stderr. I had to do it for the build record output since that was compressed before going into the database. Since the test result data is “just strings”, or so I thought, no reason to do so.

Creative Nomad Zen Reflections

In the middle of 2004 I purchased a Creative Nomad Zen Xtra portable MP3 player. “MP3 player” was not quite a commonplace concept yet but the word “iPod” was beginning to catch on. When describing this new toy to people, I usually described it as “about the same as an iPod but about 1/2 the price” which was absolutely true when I purchased it.

Here is my Nomad compared to a 1st generation Apple iPod Touch, my current MP3 player (and more):


Creative Nomad Zen Xtra compared to Apple iPod Touch (1st gen)

The Nomad Zen Xtra served me well for 3 solid years until I finally got a proper iPod in summer of 2007. I have kept the unit around since then for no particular reason. I decided to disassemble and photograph it before I send the battery and electronics off to their respective recycling destinations.


Creative Nomad Zen Xtra with its front plate and battery removed

The Nomad Zen Xtra was highly user-servicable and upgradeable. At the time I put it out of service, the battery could barely run for 5 hours (whereas, 10-12 hours was no problem when it was new). A replacement battery would be easy enough to order from assorted battery shops on the internet.


Creative Nomad Zen Xtra with back plate and hard drive removed

Have more than 40 GB of songs? Take off the back of the unit, remove the 2.5″ 40 GB IDE HD and replace with a larger one. That never proved to be necessary for me; in fact, I soon realized after I bought it that the lower-end 30 GB model would have been well more than enough.

The 40 GB HD from the unit is still perfectly good. I decided to hook it up to a Linux computer and see if there is anything I could work out about the filesystem. Before I got too far into it, a little Googling led me to a Python utility called zenrecover.py. Works famously:

$ python zenrecover.py /dev/sdc songs /home/melanson/mnt/zen
0% 3.6MB/s "Bizet_Intermezzo_from_Carmen.mp3" (6.8MB)

Just for fun, I dumped all the songs from the unit. I discovered a few things I had long forgotten and had never made the transition into my iPod. Curiously, the very first items that the utility dumped (likely because they occupied the first parts of the filesystem) were a selection of classical tunes as played by the “Beijing Central Phil Orchestra”. These songs came with the unit. It’s notable that the software transferred them off because the packaged software did not allow the user to do so (I’m pretty sure it allowed all music that was downloaded to be transferred off).

Ugh, that packaged software had to be the worst part about the Nomad Zen Xtra. I know lots of users like to chastise iTunes over a range of pet peeves. I think such people have simply never been exposed to anything worse, like this software.

Multimedia Document Management System

Someone recently updated a link in the MultimediaWiki page for mirrored documents. Naturally, that doesn’t automatically update the mirrored copy @ multimedia.cx (having me poll the page history and manually update the mirrored copy hardly counts as an automated process). I suddenly thought that it would be desirable to have a content management system that allows authorized users to upload and organize documents, particularly PDF documents which comprise many of these mirrored documents. Sort of a… document management system.

“Document Management System.” Sounds enterprise-y. Here’s what I want:

  • Free, open source solution in which I do not have to modify a single line of code
  • Allows me to create a list of users who have permissions to upload or delete PDF documents
  • Allows authorized users to upload or delete PDF documents
  • Manages at least a minimum of metadata

The key thing here is to allow authenticated users to upload and manage these mirrored documents. I know many will say, “Drupal/WordPress/MyFaveCMS can be coerced to do just that!” And I don’t dispute any such claims. It’s also true that nearly any program you need to write can be written in straight C, eschewing any higher level languages. I was just hoping for a more turnkey solution that doesn’t require me to learn a lot or do my own coding or customization.

I guess the problem here is that no one sets out to just write such a simple CMS. A CMS might start out simple but eventually grows into the next Drupal. I probably need to come to terms with that fact that there is no prepackaged solution that exactly fits this simple need without at least some tinkering.