Investigating Hachoir

In response to yesterday’s brainstorm, Mjules tipped me off regarding another tool that falls squarely into the “I wish I had thought of that” category– Hachoir (wish I knew how to pronounce it). It’s a Python-based framework for writing file parsers.

Hachoir mascot appliance

Finally! I have a compelling reason to learn Python.*** Python has long been on my list of languages to figure out, along with Prolog. Tonight, I wrote a very basic extension to Hachoir to parse the BIN FMV format discovered in my most recent exploration journal entry. And look– this WordPress plugin for code syntax highlighting also does Python:

Right now, this produces the output:

root (The Amazing Spider-Man vs. The Kingpin (Sega CD) FMV)
0) chunk type= "CONF": FourCC (size 4 bytes)
4) chunk length= 0x00000028: 4 bytes (size 4 bytes)
8) raw[]= "\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0(...)" (size 3.3 MB)
[ q to quit - move with arrows, page up/down, home/end ]

I still have a lot to learn about both Python and the existing framework facilities provided by Hachoir for parsing chunked file formats. The program already includes parsers for an impressive array of file format types. One that is of particular interest to me is a QuickTime file parser that the authors concede is rather incomplete. I see real promise for this parser as a research and troubleshooting tool for one of the most involved multimedia formats available.

*** (Proviso: No disrespect meant to anyone’s favorite language. I’m as fascinated with new programming languages as the next hardcore Linux geek. But it always helps me to learn a new language when I have a clear goal outlined for doing so.)

One thought on “Investigating Hachoir

  1. sigdrak

    That would be ‘hashwar’ [ɑʃwaːɹ]

    I think one of your fellow ffmpeg developpers have contributed to the AVI parser as far as I know.

    Hachoir developper, Victor Stinner, like you, is involved in game stuff, although he develops them, for instance Wormux.

Comments are closed.