Monthly Archives: September 2006

My App A Day

This ambitious software developer, the Software Jedi, wants to write an app a day for a month and he is soliciting suggestions.


bright idea

Here is one idea that I dreamed up just the other day as I was plodding through the hex dump of yet another freshly discovered, FourCC-chunked multimedia file format. This is the proposal– maybe he will find it interesting enough to write up in C#, maybe I will have to do it instead, or maybe someone else will beat me to it:

A lot of multimedia files use what I like to call the “chunked-FourCC” format:

  chunk 0
  chunk 1
   ..
  chunk n

Chunks are formatted as:

  preamble
  payload

The preamble invariably consists of:

  chunk identifier-- usually 4 ASCII chars (FourCC)
  length

When I stumble on a new chunked-FourCC-type file format, I want to know all of the possible chunk types. I want a simple tool that could walk through all the chunks in the file and print the various types.

At issue is the preamble format– sometimes the FourCC is first, sometimes the length is first; sometimes the length is big endian, sometimes it’s little endian; sometimes there is an extra “flags” component to the preamble; sometimes the length includes the preamble chunk, sometimes it doesn’t.

So I am thinking of a utility where I can specify all of these parameters from the command line and the tool would print info about the chunks based on those instructions. A good starting point would be any Apple QuickTime (.mov) file. The chunk (“atom”) format is (and all multi-byte numbers are big endian):

  bytes 0-3    atom size (including 8-byte size and type preamble)
  bytes 4-7    atom type (ASCII chars, usually)
  bytes 8..    data

There is also a special case for large atoms:

  bytes 0-3    always 0x00000001
  bytes 4-7    atom type
  bytes 8-15   atom size (including 16-byte size and type preamble)
  bytes 16..n  data

Sega Leftovers Entry

I’m back on the case with a new entry of my Multimedia Exploration Journal. I just processed 10 Sega CD & Saturn titles. It was all quite predictable, save for one new format that I can only generically refer to as the Amazing Spider-Man BIN format based on the extension of the FMV format and the game that it comes from. As with so many multimedia formats, I find this one absolutely fascinating. The reasoning on this is that it’s a neatly chunked FourCC format that has a custom video codec apparently designed to map neatly onto the Sega CD/Genesis video hardware (I still get confused about exactly how the Sega CD extended the Genesis’s video capabilities). The format appears to define tile chunks, tile layouts, and palette RAM in discrete blocks.

Why So Many?

A multimedia colleague posed the quandary: “why are there 13 different lossless formats out there?” My best answer: Because there were at least 13 organizations or individuals that wanted their own flavor. We will probably discover that the underlying algorithms for all 13 are nearly indistingishable, just with slightly tweaked parameters. Indeed, the individual who reverse engineered Apple’s ALAC figured out portions by first understanding similar portions of FLAC.

I once reverse engineered an audio decoder from binary code only to find that is was a stock IMA ADPCM decoder. I didn’t see how it offered any advantage whatsoever over another available, free solution. I later had the opportunity to talk to someone involved with this variant’s creation. I asked why they chose to create their own format since it brought nothing new to the table; did they just want have their own format for the sake of it?

The response: “Doesn’t everybody?”