I know my solutions to certain ad-hoc problems might seem a bit excessive. But I think I was able to defend my methods pretty well in my previous post, though I do appreciate the opportunity to learn alternate ways to approach the same real-world problems. But if you thought my methods for downloading multiple files were overkill, just wait until you see my solution for processing a long list of files just to learn — yes or no — which ones crash FFmpeg.
So we got this lengthy list of files from Picsearch that crash FFmpeg, or were known to do so circa mid-2007. Now that I have downloaded as many as are still accessible (about 4400), we need to know which files still crash or otherwise exit FFmpeg with a non-zero return code. You’ll be happy to know that I at least know enough shell scripting to pull off a naive solution for this:
for file in * do ffmpeg -i \"$file\" -f framecrc - > /dev/null 2>&1 echo $? \"$file\" done > errors.txt
Feed each file into FFmpeg, ignoring both stdout and stderr, and dump the return code and the filename to errors.txt. Seems straightforward enough. Except, perhaps if a particular file seems to cause FFmpeg to stick in an infinite loop. My first idea at that point was to break the script and investigate further… oops, now I have to do the whole list again. Instead, how about dumping all the filenames to a file and then piping them through a “while read” conditional construct so that I can break the script, kill the offending file, delete the processed files from the list, and restart. Although I realized it also works to simply “killall -9 ffmpeg” when I have assessed that a process has been running too long.
Unfortunately, that still requires a degree of babysitting. Further, I could never seem to get more than about halfway through the samples before Mac OS X (which was hosting this process) stalled on me. Maybe it crashed, but now that I think about it, the OS was probably just becoming extremely unresponsive due to an errant file causing FFmpeg to leak as much memory, including virtual, that it could. So I needed to reboot the Mac, after which I didn’t feel I could trust the contents of errors.txt due to the nature of OS file caching.
So my solution? Enter Python assisted by a guaranteed-synced database named SQLite. Look, it’s hard to resist such a robust solution when Python and SQLite make it so easy to program. Go ahead, you stupid samples– take down my computer using FFmpeg as your attack vector. At least I am sure that the results prior to the crash are accurate. Further, I can break the script and resume it exactly where it left off.
The script follows. For good measure, I decided to run it confined in a Linux VMware session to mitigate damage to my main machine. Also, I disabled swap in the VMware session so that FFmpeg can’t go overboard on the memory. I lifted and modified the run_process method from my FATE scripts. It’s a little overkill for this purpose, but it works and easily allows me to kill a process after a minute. Something I learned about FFmpeg thanks to this exercise: When I execute a new process with FFmpeg inside, for some reason, FFmpeg actually runs in a separate child process (pid = one more than I expected). My original run_process method tries to kill the child a little more responsibly using the pid. The “killall -9 ffmpeg” is an obvious hack when I noticed the pid discrepancies.