Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


Meta:

FFmpeg Autobuilds

September 21st, 2006 by Multimedia Mike

I have been sitting on this for at least a month. As brainstormed in this post, I have developed a system on a spare, headless, always-on x86 Linux box that automatically updates its FFmpeg SVN copy and compiles it with 6 different gcc versions, essentially the latest in each of the 2.95, 3.2, 3.3, 3.4, 4.0, and 4.1 series. That’s all working quite well. The part that I’m trying to resolve right now is what to do with the results. I would like to aggregate the results into a concise format for easy web reading. Plus, it would be good to have a history of build successes/failures. I think an RSS aggregation would be useful as well. And for bonus points, some halfway intelligent system that figures out which warnings occur in all or most builds. This would reveal good code janitor work for aspiring FFmpeg developers.

This is just the first phase. Of course, I want to add functional tests later, such as the standard regression. I am still trying to get the infrastructure up.

Related post:

Posted in FATE Server, Open Source Multimedia | 1 Comment »

Branching Out

September 18th, 2006 by Multimedia Mike

Cyril Zorin wanted a good place to document a 3D model format used in a LucasArts computer game. I thought this might be a good time to expand the charter of the MultimediaWiki to include more general game formats, not just the FMV-oriented ones. That is, unless there are other wikis devoted to that same charter. Perhaps I’m not asking the right questions, but a quick Google for “game format wiki” brings up this domain and the XentaxWiki in the first page.


joystick

BTW, Cyril wants to know if anyone has anything to add to the inaugural 3D format page in the MultimediaWiki.

Posted in Game Hacking | 7 Comments »

Reverse Engineering Artwork

September 16th, 2006 by Multimedia Mike

VAG directed me to some curious visualization of console game disassemblies filed under distellamap. This particular experiment targets Atari games and helps to highlight some of the graphics contained inside. Nearer and dearer to my heart is some of the research upon which this was based called dismap which maps the flow of selected Nintendo Entertainment System games.

Posted in Reverse Engineering | No Comments »

Investigating Hachoir

September 14th, 2006 by Multimedia Mike

In response to yesterday's brainstorm, Mjules tipped me off regarding another tool that falls squarely into the "I wish I had thought of that" category-- Hachoir (wish I knew how to pronounce it). It's a Python-based framework for writing file parsers.


Hachoir mascot appliance

Finally! I have a compelling reason to learn Python.*** Python has long been on my list of languages to figure out, along with Prolog. Tonight, I wrote a very basic extension to Hachoir to parse the BIN FMV format discovered in my most recent exploration journal entry. And look-- this WordPress plugin for code syntax highlighting also does Python:

PYTHON:
  1. from hachoir.field import (Parser, ParserError,
  2.     UInt8, UInt16, UInt32, String, RawBytes)
  3. from hachoir.endian import BIG_ENDIAN
  4. from hachoir.text_handler import hexadecimal
  5.  
  6. class SpiderManBinFile(Parser):
  7.     tags = {
  8.         "file_ext": "bin",
  9.         "min_size": 8*8,
  10.         "description": "The Amazing Spider-Man vs. The Kingpin (Sega CD) FMV"
  11.     }
  12.  
  13.     endian = BIG_ENDIAN
  14.  
  15.     def validate(self):
  16.         return (self.stream.readBytes(0, 4) == 'CONF')
  17.  
  18.     def createFields(self):
  19.         yield String(self, "chunk type", 4, "FourCC")
  20.         yield UInt32(self, "chunk length", "4 bytes", text_handler=hexadecimal)

Right now, this produces the output:

root (The Amazing Spider-Man vs. The Kingpin (Sega CD) FMV)
0) chunk type= "CONF": FourCC (size 4 bytes)
4) chunk length= 0x00000028: 4 bytes (size 4 bytes)
8) raw[]= "\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0(...)" (size 3.3 MB)
[ q to quit - move with arrows, page up/down, home/end ]

I still have a lot to learn about both Python and the existing framework facilities provided by Hachoir for parsing chunked file formats. The program already includes parsers for an impressive array of file format types. One that is of particular interest to me is a QuickTime file parser that the authors concede is rather incomplete. I see real promise for this parser as a research and troubleshooting tool for one of the most involved multimedia formats available.

*** (Proviso: No disrespect meant to anyone's favorite language. I'm as fascinated with new programming languages as the next hardcore Linux geek. But it always helps me to learn a new language when I have a clear goal outlined for doing so.)

Posted in Python, Reverse Engineering | 1 Comment »

My App A Day

September 13th, 2006 by Multimedia Mike

This ambitious software developer, the Software Jedi, wants to write an app a day for a month and he is soliciting suggestions.


bright idea

Here is one idea that I dreamed up just the other day as I was plodding through the hex dump of yet another freshly discovered, FourCC-chunked multimedia file format. This is the proposal-- maybe he will find it interesting enough to write up in C#, maybe I will have to do it instead, or maybe someone else will beat me to it:

A lot of multimedia files use what I like to call the "chunked-FourCC" format:

  chunk 0
  chunk 1
   ..
  chunk n

Chunks are formatted as:

  preamble
  payload

The preamble invariably consists of:

  chunk identifier-- usually 4 ASCII chars (FourCC)
  length

When I stumble on a new chunked-FourCC-type file format, I want to know all of the possible chunk types. I want a simple tool that could walk through all the chunks in the file and print the various types.

At issue is the preamble format-- sometimes the FourCC is first, sometimes the length is first; sometimes the length is big endian, sometimes it's little endian; sometimes there is an extra "flags" component to the preamble; sometimes the length includes the preamble chunk, sometimes it doesn't.

So I am thinking of a utility where I can specify all of these parameters from the command line and the tool would print info about the chunks based on those instructions. A good starting point would be any Apple QuickTime (.mov) file. The chunk ("atom") format is (and all multi-byte numbers are big endian):

  bytes 0-3    atom size (including 8-byte size and type preamble)
  bytes 4-7    atom type (ASCII chars, usually)
  bytes 8..    data

There is also a special case for large atoms:

  bytes 0-3    always 0x00000001
  bytes 4-7    atom type
  bytes 8-15   atom size (including 16-byte size and type preamble)
  bytes 16..n  data

Posted in Open Source Multimedia | 2 Comments »

Sega Leftovers Entry

September 12th, 2006 by Multimedia Mike

I'm back on the case with a new entry of my Multimedia Exploration Journal. I just processed 10 Sega CD & Saturn titles. It was all quite predictable, save for one new format that I can only generically refer to as the Amazing Spider-Man BIN format based on the extension of the FMV format and the game that it comes from. As with so many multimedia formats, I find this one absolutely fascinating. The reasoning on this is that it's a neatly chunked FourCC format that has a custom video codec apparently designed to map neatly onto the Sega CD/Genesis video hardware (I still get confused about exactly how the Sega CD extended the Genesis's video capabilities). The format appears to define tile chunks, tile layouts, and palette RAM in discrete blocks.

Posted in General | No Comments »

Zelda Quests On

September 10th, 2006 by Multimedia Mike

I do hope to one day follow up on more ZeldaClassic hacking. Meanwhile, John Berry/Ulf Magnet is beginning with some of my research and putting together his own Python utilities to work with the game's data files.

Related posts:

Posted in Nintendo, ZeldaClassic | No Comments »

Why So Many?

September 3rd, 2006 by Multimedia Mike

A multimedia colleague posed the quandary: "why are there 13 different lossless [audio] formats out there?" My best answer: Because there were at least 13 organizations or individuals that wanted their own flavor. We will probably discover that the underlying algorithms for all 13 are nearly indistingishable, just with slightly tweaked parameters. Indeed, the individual who reverse engineered Apple's ALAC figured out portions by first understanding similar portions of FLAC.

I once reverse engineered an audio decoder from binary code only to find that is was a stock IMA ADPCM decoder. I didn't see how it offered any advantage whatsoever over another available, free solution. I later had the opportunity to talk to someone involved with this variant's creation. I asked why they chose to create their own format since it brought nothing new to the table; did they just want have their own format for the sake of it?

The response: "Doesn't everybody?"

Posted in Reverse Engineering | 1 Comment »

NEC V60

September 1st, 2006 by Multimedia Mike

Does anyone happen to have any technical documentation (e.g., software developer's manual) for the NEC V60 CPU?

Posted in Game Hacking | 2 Comments »