Author Archives: Multimedia Mike

Designing A Download Strategy

The uncommon video codecs list mentioned in the last post is amazing. Here are some FourCCs I have never heard of before: 3ivd, abyr, acdv, aura, brco, bt20, bw10, cfcc, cfhd, digi, dpsh, dslv, es07, fire, g2m3, gain, geox, imm4, inmc, mohd, mplo, qivg, suvf, ty0n, xith, xplo, and zdsv. There are several that have been found to be variations of other codecs. And there are some that were only rumored to exist, such as aflc as a codec for storing FLIC data in an AVI container, and azpr as an alternate FourCC for rpza. We now have samples. The existence of many of these FourCCs has, in fact, been cataloged on FourCC.org. But I was always reticent to document the FourCCs in the MultimediaWiki unless I could find either samples or a binary codec.

But how to obtain all of these samples?

Do you ever download files from the internet? Of course you do. Do you ever download a bunch of files at a time? Maybe. But have you ever had to download a few thousand files?

I have some experience to guide me in this. Continue reading →

I (Heart) Picsearch And Python

I don’t know much about Picsearch. I don’t know what differentiates them from Google’s image search. And I certainly don’t know what they’re doing scouring the internet for video. But I know what I like, and I like the fact that Picsearch has submitted back to the FFmpeg development team 3 gargantuan lists of URLs:

A list of 5100+ URLs linking to videos that crash FFmpeg
A list of 3200 URLs linking to videos that have relatively uncommon video codecs
A list of 1600+ URLs linking to videos that have relatively uncommon audio codecs

That first list is a quality engineer’s dream come true. I was able to download a little more than 4400 of the crasher URLs. The list was collected sometime last year and the good news is that FFmpeg has fixed enough problems that over half of the alleged crashers do not crash. There are still a lot of problems but I think most of them will cluster around a small set of bugs, particularly concerning the RealMedia demuxer.

I am currently downloading the uncommon video and audio format files. Given my interests, if processing the crashers is akin the having to eat my vegetables, processing a few thousand files with heretofore unknown codecs is like dessert!

So far, the challenge here has been to both download and process the huge amount of samples efficiently. The usual “download and manually test” protocol usually followed when a problem sample is reported does not really scale in this situation. Invariably, I first try some half-hearted shell-based solutions. But… who really likes shell programming?

So I moved swiftly on to custom Python scripts for downloading and testing these files. Once I tighten up the scripts a little more and successfully process as many samples as I can, I will share them here, if only so I have a place where I can easily refer to the scripts again should I need them in the future (scripts are easily misplaced on my systems).

Actual Regression Test Output

I resisted adding ‘make test’ as an individual FATE test for a long time because it was too big, took too long to run, and because it’s an all or nothing proposition (i.e., if one test fails, the whole test is marked as a failure). Plus, I eventually want to break up each individual test in the ‘make test’ regression suite into individual FATE tests. But a few months ago, I relented — until I make the big test split — and entered test spec #128, which runs ‘make test > /dev/null 2>&1’. The redirection was necessary because ‘make test’ generates way more data than I care to track. It was suggested that I should try to do some shell magic to capture the final n lines of output in case of failure. This is a nice idea. But I couldn’t figure out how to do it due to the fact I can’t stand shell scripting.

I was recently trying to process a huge number of problematic multimedia samples using shell scripting commands. It didn’t work well. I resorted to a custom Python script which worked much better. This little episode reminded me of some other shell workarounds I deployed in FATE such as the {FILESIZE} and {MD5} custom strings. Instead of shell commands to obtain information for certain tests, I map these special commands to cleaner, more portable Python code. And it finally occurred to me to do the same for the full regression suite.

Say hello to the new test spec #128— {MAKETEST}. The FATE script recognizes this command and realizes that it should run ‘make test’, capture the stdout/stderr, clear both if the suite succeeds, or chop both to only include the last 30 lines if anything in the regression suite went bad. This will help developers study why the suite is failing on various systems.

At least until I finally develop a good plan for breaking the master regression suite into several hundred little tests.

Baby Got RV40

Kostya is overjoyed to announce that he has completed enough of his Summer of Code project for inclusion into the FFmpeg codebase — Summer of Code 2007, that is. Kostya has activated his RealVideo 4 (RV40) decoder. Pictured is a sample of “Baby Got Back”, Sir Mix-A-Lot’s timeless paean to the female backside, encoded in RV40 and available in the samples repository:

Sir Mix-A-Lot's 'Baby Got Back', presented by RealVideo 4

Ah, still a classic. Many congratulations to Kostya for persisting in this herculean task well beyond the project due date.

Also, do you dare ask why this is useful in the grand scheme of multimedia hacking? Seriously? Have you not learned by now? While it may be true that absolutely no one likes Real or their formats, there is still a huge legacy of media in the wild encoded in their formats, media that will need to be manipulated for many years to come.

I would be remiss if I did not mention another game-related format that Suxen drol committed a few weeks ago– the Electronic Arts TGQ video codec. This is a motion JPEG variation that was notably used in Crusader: No Remorse.

FATE tests have been added:

Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering

Author Archives: Multimedia Mike

Designing A Download Strategy

I (Heart) Picsearch And Python

Actual Regression Test Output

Baby Got RV40