I know my solutions to certain ad-hoc problems might seem a bit excessive. But I think I was able to defend my methods pretty well in my previous post, though I do appreciate the opportunity to learn alternate ways to approach the same real-world problems. But if you thought my methods for downloading multiple files were overkill, just wait until you see my solution for processing a long list of files just to learn — yes or no — which ones crash FFmpeg.
So we got this lengthy list of files from Picsearch that crash FFmpeg, or were known to do so circa mid-2007. Now that I have downloaded as many as are still accessible (about 4400), we need to know which files still crash or otherwise exit FFmpeg with a non-zero return code. You’ll be happy to know that I at least know enough shell scripting to pull off a naive solution for this:
for file in * do ffmpeg -i \"$file\" -f framecrc - > /dev/null 2>&1 echo $? \"$file\" done > errors.txt
Feed each file into FFmpeg, ignoring both stdout and stderr, and dump the return code and the filename to errors.txt. Seems straightforward enough. Except, perhaps if a particular file seems to cause FFmpeg to stick in an infinite loop. My first idea at that point was to break the script and investigate further… oops, now I have to do the whole list again. Instead, how about dumping all the filenames to a file and then piping them through a “while read” conditional construct so that I can break the script, kill the offending file, delete the processed files from the list, and restart. Although I realized it also works to simply “killall -9 ffmpeg” when I have assessed that a process has been running too long.
Unfortunately, that still requires a degree of babysitting. Further, I could never seem to get more than about halfway through the samples before Mac OS X (which was hosting this process) stalled on me. Maybe it crashed, but now that I think about it, the OS was probably just becoming extremely unresponsive due to an errant file causing FFmpeg to leak as much memory, including virtual, that it could. So I needed to reboot the Mac, after which I didn’t feel I could trust the contents of errors.txt due to the nature of OS file caching.
So my solution? Enter Python assisted by a guaranteed-synced database named SQLite. Look, it’s hard to resist such a robust solution when Python and SQLite make it so easy to program. Go ahead, you stupid samples– take down my computer using FFmpeg as your attack vector. At least I am sure that the results prior to the crash are accurate. Further, I can break the script and resume it exactly where it left off.
The script follows. For good measure, I decided to run it confined in a Linux VMware session to mitigate damage to my main machine. Also, I disabled swap in the VMware session so that FFmpeg can’t go overboard on the memory. I lifted and modified the run_process method from my FATE scripts. It’s a little overkill for this purpose, but it works and easily allows me to kill a process after a minute. Something I learned about FFmpeg thanks to this exercise: When I execute a new process with FFmpeg inside, for some reason, FFmpeg actually runs in a separate child process (pid = one more than I expected). My original run_process method tries to kill the child a little more responsibly using the pid. The “killall -9 ffmpeg” is an obvious hack when I noticed the pid discrepancies.
test-sample-directory-with-ffmpeg.py:
ulimit -t 60 ?
Nice, more shell black magic, I assume. I wish I had several hundred terse Unix commands and shell built-ins at my immediate recall.
So I disabled swap in this VMware image running Kubuntu with KDE 4 and only half gig of RAM. Smart. My best guess this time around is that the Linux OOM killer decided that KDE needed to die.
Checking the output of dmesg should tell you what process was killed by the OOM killer.
Calling ulimit “shell magic” is not really right (and it seems it might be a bashism, too).
It’s just a interface to the “man 3 ulimit” POSIX-function – I’d think you can also set this from python somehow.
setrlimit (POSIX, too) is the recommended way though, and I’m sure you can set all kinds of stuff you might like.
Also, concerning the OOM killer: you might want to read Documentation/vm/overcommit-accounting and others.
echo 2 > /proc/sys/vm/overcommit_memory
means that it is impossible to ever malloc()/mmap() more memory than is available, which means the OOM-killer will never run.
The multiple PID problem could be related to shell=TRUE in your options to Popen.
And why not redirect stdout&err to /dev/null so you can get rid of the select loop?
Good question. I was trying to simplify that function after lifting it from my FATE system but really just wanted to get something working. I couldn’t remember of what use “shell=TRUE” was, and I either couldn’t find or couldn’t understand the documentation on the matter. But thanks for the lead. Since this code also effects FATE, I definitely want to solve this.