Monthly Archives: November 2007

Bash Too Smart For Its Own Good

So I’m running the latest Ubuntu release in a virtual machine under my main Windows XP machine (it’s a blessing that I no longer need to choose between operating systems these days– I can run them all simultaneously on the same machine if I wish). Today I noticed a feature of Bash that for all I know has been in place for over a decade. I just have never seen it enabled by default in any distribution before. I had to do some research — indeed, one of the first problems in Unix is learning how to ask the right question — but I eventually figured out that the feature is called “Programmable Completion”.

Tab completion is God’s gift to command line navigation. It’s something I always take 10 seconds to carefully explain to anyone who is struggling with the Unix command line for the first time since navigating through a filesystem is agonizing without it. Programmable completion takes tab completion and tries to make it smarter. I first noticed this peculiar behavior on my Ubuntu VMware session today. If there are 2 files in the current directory, xyz.tar.gz and xyz.tar.bz2, then typing ‘tar zxvf x<TAB>’ automatically completes xyz.tar.gz while ‘tar jxvf x<TAB>’ automatically completes to xyz.tar.bz2. I noticed this, thought it was an interesting touch, thought for a moment about how it might work, and immediately wondered what would happen if the underlying mechanism isn’t aware of a mapping that I need.

It didn’t take long before I ran into that eventuality. I was trying to play some impossibly obscure formats with xine. I navigated to the necessary directories only to find that they were apparently empty of any files. I never realized just how important tab completion is to me. Now I’m reading pages of documentation trying to figure out how to disable programmable completion entirely. (It would probably be enough to ask the desktop terminal emulator to run my own Bash config files on startup– I don’t understand why I have to specifically configure such programs to do this.)

Effectively, Bash is making a decision here about what files it thinks I want to see. I’m surprised a Linux distribution would program that kind of behavior by default. How condescending is it when a Microsoft OS tells you that there are entire sets of files that you don’t need to see? Maybe that’s not a fair comparison, as the programmable completion feature is more akin to a list of file filters in a “File Open…” dialog box. Thing is, there’s no obvious way to select “show all files”.

Computer programs get into trouble when they try to anticipate what a normal user needs. More often, such programs have the effect of shaping end user behavior to account for the educated guesses that the computer makes.

ISO Compromise

Engineering is about trade-offs and compromises. One of the most fundamental trade-offs to be made when designing a storage format is whether multi-byte numbers will be encoded as little or big endian numbers. But have you ever studied the data structures involved in ISO-9660, the standard filesystem format for optical discs? It seems that the committee tasked with developing this standard were unwilling to make this one tough decision and specified all multi-byte numbers as omni-endian. I just made that term up. Maybe it could be called bi-endian or multi-endian. The raw detail is that multi-byte numbers are stored in little endian format and then in big endian. For example, 0x11223344 is stored using 8 bytes: 0x44 0x33 0x22 0x11 0x11 0x22 0x33 0x44.


CD-ROM

Do any other filesystems take this compromise? I am not that versed. I have studied the odd game-related optical filesystem; I had to write a manual ext2 searching tool for a sysadmin class; I also had to try to recover a virus-corrupted FAT16 filesystem (to no avail; that virus cleanly chewed up some of the most important sectors).

Anyway, if I were to go ahead and try for a new FUSE driver for ISO-9660 (or modify an existing driver), I would want to go after the main format. Plus, I would want to natively interpret that CISO format on the fly. Further, I would use this as a platform to understand what is so special about the apparent ISO-9660 data ripped from a Sega Dreamcast GD-ROM.

Are there any other ISO bastardizations to target for such a tool?

Ian’s Blog

Ian Farquhar expressed surprise the other day that anyone reads his blog. So I thought I would point out that his new blog is kind of nifty. I don’t know what backend software is powering the blog, though, and I’m mistrustful of all the Javascript goings-on. The page transitions frighten me.


My Nero logo

Anyway, he likes to talk about hard multimedia-type stuff. Which is apropos since he probably works for Nero (hey Ian: an ‘About Me’ page — about you — would do wonders). So you should check it out. We niche multimedia blogs need to stick together. I hope his blog stays fresh.

Revenge Of PAVC

Remember my old PAVC idea? I have been thinking about it again. As a refresher, this idea concerns efficiently and losslessly compressing RGB video frames output from an emulator for early video game systems such as the Nintendo Entertainment System (NES) and the Super NES. This time, I have been considering backing off my original generalized approach and going with a PPU-specific approach. (PPU stands for picture processing unit, which is what they used to call the video hardware in these old video games systems.) Naturally, I would want to start this experiment (again) with my favorite — nay — the greatest video game console of all time, the NES. Time for more obligatory, if superfluous, NES screenshots.


Little Samson (NES) game map
Little Samson
, all-around awesome game

Here’s the pitch: Modify an emulator (I’m working with FCE-Ultra) to dump PPU data to a file. Step 2 is to take that data and run it through a compression tool. What kind of data would I care about for step 1? On the first frame, dump out all of the interesting areas of the PPU memory space. This may sound huge, but it is only about 9-12 kilobytes, depending on the cartridge hardware. Also, dump the initial states of a few key PPU registers that are mapped into the CPU’s memory space. As the game runs, watch all of these memory and register values and log changes. This really isn’t as difficult as it sounds since FCEU already cares deeply when one of these values changes. When something changes, mark it as “dirty” and dump that value during the next scanline update.

With that data captured, the next challenge is to compress it. I am open to suggestions on how best to encode this change data. I would hope that we could come up with something a little better than shoving a frame of change data through zlib.

Decompression and playback would entail unraveling whatever was performed in step 2 above. Then, the decoder simulates the NES PPU by drawing scanline by scanline, and applying state change data between scanlines.

What are the benefits to this approach? Ideally, I am aiming not only for lossless compression, but for better compression than what is ordinarily achieved with the large files distributed via BitTorrent and coordinated at tasvideos.org. When I first started investigating this idea over 2 years ago, MPEG-4 variants were still popular for compressing the videos. These days, H.264 seems to have taken over, which performs much better, even on this type of data (allegedly, H.264’s 4×4 transform allows for lower artifacts on sharp edge data such as material from old video game consoles).


Sword Master (NES)
Sword Master
, mediocre game with great graphics

There are also some benefits from the perspective of NES purists. The most flexible NES emulators allow the user to switch palettes in order to get one that is “just right”. A decoder for this type of data could offer the same benefits.

Of course, an encoder is not much use without an analogous decoder that end users can easily install and use. I think this is less of an issue due to the possibility of creating a decoder in Flash or Java.