Author Archives: Multimedia Mike

Processing Those Crashers

I know my solutions to certain ad-hoc problems might seem a bit excessive. But I think I was able to defend my methods pretty well in my previous post, though I do appreciate the opportunity to learn alternate ways to approach the same real-world problems. But if you thought my methods for downloading multiple files were overkill, just wait until you see my solution for processing a long list of files just to learn — yes or no — which ones crash FFmpeg.

So we got this lengthy list of files from Picsearch that crash FFmpeg, or were known to do so circa mid-2007. Now that I have downloaded as many as are still accessible (about 4400), we need to know which files still crash or otherwise exit FFmpeg with a non-zero return code. You’ll be happy to know that I at least know enough shell scripting to pull off a naive solution for this: Continue reading

Designing A Download Strategy

The uncommon video codecs list mentioned in the last post is amazing. Here are some FourCCs I have never heard of before: 3ivd, abyr, acdv, aura, brco, bt20, bw10, cfcc, cfhd, digi, dpsh, dslv, es07, fire, g2m3, gain, geox, imm4, inmc, mohd, mplo, qivg, suvf, ty0n, xith, xplo, and zdsv. There are several that have been found to be variations of other codecs. And there are some that were only rumored to exist, such as aflc as a codec for storing FLIC data in an AVI container, and azpr as an alternate FourCC for rpza. We now have samples. The existence of many of these FourCCs has, in fact, been cataloged on FourCC.org. But I was always reticent to document the FourCCs in the MultimediaWiki unless I could find either samples or a binary codec.

But how to obtain all of these samples?

Do you ever download files from the internet? Of course you do. Do you ever download a bunch of files at a time? Maybe. But have you ever had to download a few thousand files?

I have some experience to guide me in this. Continue reading

I (Heart) Picsearch And Python

I don’t know much about Picsearch. I don’t know what differentiates them from Google’s image search. And I certainly don’t know what they’re doing scouring the internet for video. But I know what I like, and I like the fact that Picsearch has submitted back to the FFmpeg development team 3 gargantuan lists of URLs:

  1. A list of 5100+ URLs linking to videos that crash FFmpeg
  2. A list of 3200 URLs linking to videos that have relatively uncommon video codecs
  3. A list of 1600+ URLs linking to videos that have relatively uncommon audio codecs


Picsearch logo

That first list is a quality engineer’s dream come true. I was able to download a little more than 4400 of the crasher URLs. The list was collected sometime last year and the good news is that FFmpeg has fixed enough problems that over half of the alleged crashers do not crash. There are still a lot of problems but I think most of them will cluster around a small set of bugs, particularly concerning the RealMedia demuxer.

I am currently downloading the uncommon video and audio format files. Given my interests, if processing the crashers is akin the having to eat my vegetables, processing a few thousand files with heretofore unknown codecs is like dessert!

So far, the challenge here has been to both download and process the huge amount of samples efficiently. The usual “download and manually test” protocol usually followed when a problem sample is reported does not really scale in this situation. Invariably, I first try some half-hearted shell-based solutions. But… who really likes shell programming?

So I moved swiftly on to custom Python scripts for downloading and testing these files. Once I tighten up the scripts a little more and successfully process as many samples as I can, I will share them here, if only so I have a place where I can easily refer to the scripts again should I need them in the future (scripts are easily misplaced on my systems).

Actual Regression Test Output

I resisted adding ‘make test’ as an individual FATE test for a long time because it was too big, took too long to run, and because it’s an all or nothing proposition (i.e., if one test fails, the whole test is marked as a failure). Plus, I eventually want to break up each individual test in the ‘make test’ regression suite into individual FATE tests. But a few months ago, I relented — until I make the big test split — and entered test spec #128, which runs ‘make test > /dev/null 2>&1’. The redirection was necessary because ‘make test’ generates way more data than I care to track. It was suggested that I should try to do some shell magic to capture the final n lines of output in case of failure. This is a nice idea. But I couldn’t figure out how to do it due to the fact I can’t stand shell scripting.

I was recently trying to process a huge number of problematic multimedia samples using shell scripting commands. It didn’t work well. I resorted to a custom Python script which worked much better. This little episode reminded me of some other shell workarounds I deployed in FATE such as the {FILESIZE} and {MD5} custom strings. Instead of shell commands to obtain information for certain tests, I map these special commands to cleaner, more portable Python code. And it finally occurred to me to do the same for the full regression suite.

Say hello to the new test spec #128— {MAKETEST}. The FATE script recognizes this command and realizes that it should run ‘make test’, capture the stdout/stderr, clear both if the suite succeeds, or chop both to only include the last 30 lines if anything in the regression suite went bad. This will help developers study why the suite is failing on various systems.

At least until I finally develop a good plan for breaking the master regression suite into several hundred little tests.