Category Archives: Outlandish Brainstorms

Entertaining bizarre ideas largely related to multimedia hacking and reverse engineering

Designing a Codec For Parallelized Entropy Decoding

When I was banging my head hard on FFmpeg’s VP3/Theora VLC decoder, I kept wondering if there’s a better way to do it. Libtheora shows us the way in that there obviously must be, since it can decode so much faster than FFmpeg.

In the course of brainstorming better VLC approaches, I wondered about the concept of parallelizing the decoding operation somehow. This sounds awfully crazy, I know, and naturally, it’s intractable to directly parallelize VLC decoding since you can’t know where item (n+1) begins in the bitstream until you have decoded item (n). I thought it might be plausible to make a pass through the bitstream decoding all of the tokens and ancillary information into fixed lenth arrays and then make a second pass in order to sort out the information in some kind of parallelized manner. Due to the somewhat context-adaptive nature of Theora’s VLCs, I don’t know if I can make a go of that.

Why can’t VLC decoding be done in parallel, either through SIMD or thread-level parallelization? I think the answer is clearly, “because the codecs weren’t designed for it”. Here’s the pitch: Create a codec bitstream format where the frame header contains indices pointing into the bitstream. For example, Sorenson Video 1 encodes the Y, U, and V planes in order and could be effectively decoded in parallel if only the frame header encoded offsets to the start of the U and V planes. Similarly, if a Theora frame header encoded offsets to certain coefficient groups in the header, that could facilitate multithreaded coefficient VLC decoding. Some other modifications would need to occur in the encoder, e.g., guaranteeing that no end-of-block runs cross coefficient group boundaries as they often do now.

Taking this a step farther, think about arithmetic/range coding. Decoding a bitstream encoded with such an algorithm notoriously requires a multiplication per value decoded. How about designing the bitstream in such a way that 2 or 4 multiplications can be done in parallel through any number of SIMD instruction sets?

Do these ideas have any plausibility? I have this weird feeling that if they do, they’re probably already patented. If they aren’t, well, maybe this post can serve as prior art.

Better Compression For ISOs

Pursuant to my project idea of archiving a bunch of obscure, old, CD-ROM-based games, I got an idea about possibly compressing ISO-9660 filesystems more efficiently than could be done with standard lossless compressors such as gzip, bzip2, and lzma. This is filed under “outlandish brainstorms” since I don’t intend to get around to doing this anytime in the near future, but I wanted to put the idea out there anyway.

Game developers throughout the era of optical disc-based games have been notoriously lazy about data efficiency. A typical manifestation of this is when one opens a disc to find dozens or hundreds of sizable, uncompressed WAV audio files. Same goes for uncompressed graphical assets. General lossless compressors are able to squeeze some bytes out of uncompressed audio and video data but specialized algorithms (like FLAC for audio and PNG for images) perform better.

Here’s the pitch: Create a format that analyzes individual files in an ISO filesystem and uses more appropriate compression algorithms. If there’s a 16-bit PCM WAV file, use FLAC to compress it. If there’s an uncompressed BMP file, compress as a PNG file. Stuff them all in a new file with an index and a mapping to their original files.

As an additional constraint, it obviously needs to be possible to reconstruct a bit-exact ISO from one of these compressed images. So the index will need to store information about what sector ranges different files occupied.

The only other attempt I have seen for a specialized ISO compression format is the CISO format used for compressing ISO filesystems for storage on flash memory to be read in PSP units. That format uses zlib to compress sector by sector which has the advantage of being able to mount and use a CISO image in place without decompressing the entire filesystem. That should be possible for this filesystem a well by using a FUSE driver. Mount the filesystem and the driver presents the file tree upon request and decompressed the files on demand.

Many compression algorithms have assorted methods of operation depending on what is appropriate for a particular range of data. This scheme is merely an extension of that principle. I have no idea if this idea would hold water in the general case. But thanks to my archiving brainstorm, I expect I will have quite a lot of data to analyze.

See Also:

FATE Opportunities In The Cloud

Just a few months ago, I couldn’t see any value in this fad known as cloud computing. But then, it was in the comments for that blog post that Tomer Gabel introduced me to the notion of using cloud computing for, well, computing. Prior to that, I had only heard of cloud computing as it pertained to data storage. A prime example of cloud computing resources is Amazon’s Elastic Compute Cloud (EC2).

After reading up on EC2, my first thought vis-à-vis FATE was to migrate my 32- and 64-bit x86 Linux compilation duties to a beefy instance in the cloud. However, the smallest instance currently costs $73/month to leave running continously. That doesn’t seem cost effective.

My next idea was to spin up an instance for each 32- or 64-bit x86 Linux build on demand when there was new FFmpeg code in need of compilation. That would mean 20 separate instances right now, instances that each wouldn’t have to run very long. This still doesn’t seem like a very good idea since instance computing time is billed by the hour and it’s rounded up. Thus, even bringing up an instance for 8 minutes of build/test time incurs a full billable hour. I’m unclear on whether bring-up/tear-down cycles of, say, 1 minute each, are each billed as separate compute hours, but it still doesn’t sound like a worthwhile solution no matter how I slice it.

A different architecture occurred to me recently: For each new code revision, spin up a new, single core compute instance and run all 20 build/test configurations serially. This would at long last guarantee that each revision is being built, at least for 32- and 64-bit x86 Linux. It’s interesting to note that I could easily quantify the financial cost of an SVN commit– if it took, say, 2-3 hours to build and test the 20 configs, that would amount to $.30 per code commit.

Those aren’t the only costs, though. There are additional costs for bandwidth, both in and out (at different rates depending on direction). Fortunately, I designed FATE to minimize bandwidth burden, so I wouldn’t be worried about that cost. Understand, though, that data on these compute instances is not persistent. If you need persistent storage, that’s a separate service called Elastic Block Store. I can imagine using this for ccache output. Pricing for this service is fascinating: Amazon charges for capacity used on a monthly basis, naturally, but also $.10 per million I/O requests.

This is filed under “Outlandish Brainstorms” for the time being, mostly because of the uncertain and possibly intractable costs (out of my own pocket). But the whole cloud thing is worth keeping an eye on since it’s a decent wager that prices will only decline from this point on and make these ideas and others workable. How about a dedicated instance loaded with the mphq samples archive, iterating through it periodically and looking for crashers? How about loading up a small compute instance’s 160 GB of space with a random corpus of multimedia samples found all over the web (I think I know people who could hook us up) which would start periodically and process random samples from the corpus while tracking and logging statistics about performance?


For my latest silly little project, I created an offline database containing information about Apple’s iPhone App Store, made it browseable offline with full text search, and even generated some nifty charts and tables about the apps. Wanna see?

Go here for all of the juicy data.

Have you ever gotten one little spark of an idea and started to research and prototype it, only to have it snowball into something absolutely unrecognizable in a short period of time? As I write up this idea, I can scarcely remember how I started down the path of creating an offline, web browser-accessible version of Apple’s iPhone App Store. On the way, I learned a bunch about modern web programming and even full text search.

What’s wrong with me anyway? Why can’t I do simple exercises to come up to speed on certain well-established concepts? I think normal people would start out by developing trivial websites showcasing pictures of their cats when trying to get up to speed on modern web development. But no, not me. No, I just have to punish myself by dreaming up the most outlandish scenario in which to learn some technology, purely as an ancillary goal to a bizarre primary focus.

The Pitch
Look, here’s how it happened: Remember, I contribute heavily to this video game database called MobyGames. The database excels in archiving data about arcane systems and obscure, archaic titles. That, and Barbie games, thanks to my tireless efforts. The database has challenges keeping up with all the latest and greatest releases for all kinds of systems. However, this whole iPhone App Store thing is really throwing us a curveball. In just a little over a year, over 13,000 unique titles categorized as “games” have accumulated for the iPhone/iPod Touch system. At the time of this writing, MobyGames has cataloged a meager 111 of these titles.

Continue reading