Musings of a File Format Hacker

Some anonymous hacker recently made a name for himself (wait, can you make a name for yourself anonymously?) by ranting against the difficulty of working with certain Adobe file formats. It got me wondering if we could draw more attention to the FFmpeg program by littering the code with rants against the file formats we’re trying to re-implement. I’d like to think we’re above that, if only because the average FFmpeg hacker intuitively understands that no one ever meant for us to be able to re-implement these formats, at least not the proprietary formats.

Ironically, I think you’ll hear the most complaints from the crew where free, open formats are concerned.

You can look at a seemingly bizarre format and generally experience 1 of 2 reactions:

  1. What kind of moron thought of this format?! What were they thinking?!
  2. What did the creator of this format know that I don’t know? What were the original design goals and what problems were they trying to solve.

Over the years, I’ve trained myself to have reaction #2. I’m not saying I’m superior; on the contrary, the philosophy leads me down the wrong path sometimes when it turns out that the format’s originator honestly didn’t know what they were doing. In that case, I end up giving them too much credit.

Along these same lines, Joel Spolsky’s take is absolutely fascinating: He describes how the Microsoft Office formats evolved to their present complexity — not out of spite for third party programmers, but to meet the needs of the applications’ features.

Actually, an impromptu and unscientific audit of the FFmpeg code (grep’ing for certain keywords) does seem to indicate a high level of animosity towards Microsoft.

Suddenly, I feel revitalized regarding the MultimediaWiki and its charter. Someday, someone somewhere is going to want to know about that UMV format on Orpheo’s Curse.

4 thoughts on “Musings of a File Format Hacker

  1. Diego “Flameeyes” Pettenò

    If you had the time you could explain me why FLAC format uses packed 12-bit data fields in the metadata instead of padding it to 16 like almost any other sane format out there, because I never understood it.

    It’s not like it’s a streaming format that is used in low-bandwidth conditions like RTP; a FLAC file is usually quite over the 10MB…

  2. astrange

    He’s still not an anonymous hacker when his name is in Xee’s about window….

    Anyway, I’d like to know how people come up with all these weird timestamps. At least MKV’s inaccurate decimal system might be unique.

  3. Reimar

    Well, the rants about ogg on the mailing list I think are at least as bad as that comment. And I think most who ever reverse-engineered something at least know the feeling that leads to such a comment. At best we have learned to use that feeling more “constructively” ;-).

Comments are closed.