Monthly Archives: December 2008

Processing The Unknowns

This is the general process I have been using for working through the unknown video codec samples (but not always in this order):

  • Starting with the FourCC (which is usually how the samples are sorted thanks to my download method), look up codec in the MultimediaWiki to see if something is already known
  • Check the mphq archive to see if similar examples are already cataloged in the V-codecs directory
  • Check the FourCC list to see if they have any knowledge about the codec
  • Consult Google
  • Study the raw bytes of the file to see if there are any obvious free-form userdata strings in the header that would give away information
  • Run ‘ffmpeg -i <sample> -an -f image2 -vcodec copy %05d.frm’ on the sample to break up the frames into individual files
  • Observe characteristics about the sizes of each frame– if they are all the same then do some math based on the size of each frame and the resolution of the video file and try to guess the format; make other educated guesses based on frame sizes (all frames roughly the same size may indicate an intra-coded — i.e., all keyframes — codec; codec where the first frame is enormous followed by a lot of extremely small frames, combined with other intelligence, may indicate a screen capture codec, my current hypothesis for Microsoft Camcorder Video)
  • Upload samples to mphq and file appropriately; preferred strategy for samples: try to catalog at least 2 samples for a format, but no more than 5; make them each less than 5 megabytes if possible; if there is a choice, try to grab samples from different sources rather than grabbing multiple samples from one server (which were likely created with the same version of the same software using the same parameters); create readme.txt file that lists the original URLs for the files
  • Create a new MultimediaWiki page for the format; create a FourCC redirect page so that the video FourCC is automatically categorized

Also, compn demonstrates that it’s important to try forcing the video data through several common codecs, most notably ISO MPEG-4 part 2 (a.k.a. DIVX/XVID) and JPEG.

I would like to hear other basic strategies for analyzing unknown formats.

FATE on MIPS

I have an interest in testing FFmpeg on a wide diversity of platforms via FATE. Pursuant to another post of non-x86 architectures, I learned that there are MIPS-based Linux laptops in the works.

I learned that one such laptop is the HiVision miniNote. Another is the Skytone Alpha-400. I have also learned that they are pretty much the same and that they go by other names depending on the regions in which they are marketed. However, the Skytone Alpha-400 is something I could order today if I wanted to (Geeks.com sells the MIPS Alpha-400 machines). And it’s also useful to note that the latest incarnation of the series uses Intel XScale CPUs rather than MIPS derivatives.

Unfortunately, the Alpha-400 and related laptops really aren’t made for general hacking. Allegedly, someone in .nl has figured out how to get a root prompt, but it would require knowing Netherlish to decode the instructions.

In the course of the previous discussion, I also learned of the Gdium which features a slightly more powerful MIPS CPU. This might make a better platform for FATE testing. There isn’t much information on their site about the possibility of purchasing one, but there is a blog post about their desire to attract open source developers. Hey! I’m attracted! Perhaps if someone knew someone involved with these products, those people would be interested in performing automated QA for FATE?

Meanwhile, I am tweaking the core FATE script to support cross compiling and remote execution of tests. Think of it as phase 2.

What Platform Would You Like To Test?

What platforms should FATE test? So far, it is testing both x86_32 and x86_64 under Linux and Mac OS X, and PowerPC under Linux. These are the first and foremost platforms I care about, and have access to.

What other platforms would people like to see tested through FATE? Windows? *BSD? Solaris? Linux running on exotic bits of hardware? Game consoles? Well, I have great news: After many months of occasional work on the FATE infrastructure, I am confident that the system is in a shape where other people can run the core FATE script and submit results back to the main server.

I have released the first public version of the script at the core of FATE: fate-script.py. Anyone can run it locally on their own platforms, but it requires a few credentials (assigned by me) in order to submit results to the server.

Feedback is very welcome, as are offers to run FATE continuously on other platforms. Also, I would love to know how to properly version something with git. All I can say is that the currently posted version of fate-script.py is version 67eac48073a24deece52cb28fbb25c14858b6c23.

Parsing In Python

I wanted to see if the video frames inside these newly discovered ACDV-AVI files were just regular JPEG frames stuffed inside an AVI file. JPEG is a picky matter and many companies have derived their own custom bastardizations of the format. So I just wanted to separate out the data frames into individual JPEG files and see if they could be decoded with other picture viewers. Maybe FFmpeg can already do it using the right combination of command line options. Or maybe it’s trivial to hook up the ‘ACDV’ FourCC to the JPEG decoder in the source code. What can I say? FFmpeg intimidates me just as much as it does any of you mere mortals.

Plus, I’m getting a big kick out of writing little tools in Python. For a long time, I had a fear of processing binary data in very high level languages like Perl, believing that they should be left to text processing tasks. This needn’t be the case. pack() and unpack() make binary data manipulation quite simple in Perl and Python. Here’s a naive utility that loads an AVI file in one go, digs through it until it finds a video frame marker (either ’00dc’ or — and I have never seen this marker before — ’00AC’) and writes the frame to its own file.

acdv.py:

BTW, the experiment revealed that, indeed, the ACDV video frames can each stand alone as separate JPEG files.