Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


Parsing In Python

December 9th, 2008 by Multimedia Mike

I wanted to see if the video frames inside these newly discovered ACDV-AVI files were just regular JPEG frames stuffed inside an AVI file. JPEG is a picky matter and many companies have derived their own custom bastardizations of the format. So I just wanted to separate out the data frames into individual JPEG files and see if they could be decoded with other picture viewers. Maybe FFmpeg can already do it using the right combination of command line options. Or maybe it’s trivial to hook up the ‘ACDV’ FourCC to the JPEG decoder in the source code. What can I say? FFmpeg intimidates me just as much as it does any of you mere mortals.

Plus, I’m getting a big kick out of writing little tools in Python. For a long time, I had a fear of processing binary data in very high level languages like Perl, believing that they should be left to text processing tasks. This needn’t be the case. pack() and unpack() make binary data manipulation quite simple in Perl and Python. Here’s a naive utility that loads an AVI file in one go, digs through it until it finds a video frame marker (either ’00dc’ or — and I have never seen this marker before — ’00AC’) and writes the frame to its own file.


from struct import *
import sys

if len(sys.argv) < 2: print "Usage:

data = open(sys.argv[1], “rb”).read()

fileno = 0
for i in xrange(len(data)):
if data[i:i+4] == ’00dc’ or data[i:i+4] == ’00AC’:
size = unpack(‘

Posted in Programming, Python | 10 Comments »

10 Responses

  1. Kostya Says:

    ffmpeg -i inputfile -f image2 {-vn -acodec copy | -an -vcodec copy} %06d.frm will put audio or video frames into separate files. I use it extensively for investigating format.

  2. Multimedia Mike Says:

    Thanks. That’s going in a cheat sheet somewhere.

  3. Multimedia Mike Says:

    Thanks again for the tip. I already put it to use trying to figure out this format tonight:

  4. compn Says:

    the quick and easy way to test for a decoder is
    mplayer -vc +ffmjpeg file.avi

    the + forces the codec to decode the file when the fourcc is not listed in its codecs.conf entry.
    i forced many codecs on the new samples.

  5. Hari Jayaram Says:

    Any plans of writing a python wrapper api for ffmpeg , like the ffmpeg-php library. It will be great to manipulate video files without learning to use the fmpeg c api.
    Also just curious what plugin you are using to display your code in this blog .

  6. Multimedia Mike Says:

    I would certainly like to see Python bindings for FFmpeg. It’s not on my short-term TODO list, though.

    For syntax highlighting, I use a plugin called iG:Syntax Hiliter: . It supports a wide array of languages and is extremely convenient to use– just surround the code block with square brackets that name the language.

  7. m Says:

    For parsing binary files with python, the project can be interesting

  8. Multimedia Mike Says:

    Oh yes, I’ve already done some work with Hachoir:

    I would like to do more since I like hacking file formats so much.

  9. Hari Jayaram Says:

    Thanks mike , I have the ig:syntax hiliter going with the hack courtesy shantanu-goel that allows you to edit code in the visual editor itself .
    Also looking forward to some python binding . Currently I am using python to automate ffmmpeg transcoding of several files via system calls. It sure would be great to talk to ffmpeg directly , but sadly I lack the code skills to do this.

  10. Multimedia Mike Says:

    What kind of deep manipulation do you need to do? What do you hope to be able to do that the shell script combined with ‘ffmpeg’ can’t already do? My entire FATE testing system is built on Python simply invoking ‘ffmpeg’ through a shell.