{"id":690,"date":"2008-12-07T21:02:20","date_gmt":"2008-12-08T05:02:20","guid":{"rendered":"http:\/\/multimedia.cx\/eggs\/?p=690"},"modified":"2020-07-25T22:24:14","modified_gmt":"2020-07-26T05:24:14","slug":"processing-those-crashers","status":"publish","type":"post","link":"https:\/\/multimedia.cx\/eggs\/processing-those-crashers\/","title":{"rendered":"Processing Those Crashers"},"content":{"rendered":"<p>I know <a href=\"http:\/\/multimedia.cx\/eggs\/designing-a-download-strategy\/\">my solutions to certain ad-hoc problems<\/a> might seem a bit excessive. But I think I was able to defend my methods pretty well in my previous post, though I do appreciate the opportunity to learn alternate ways to approach the same real-world problems. But if you thought my methods for downloading multiple files were overkill, just wait until you see my solution for processing a long list of files just to learn &#8212; yes or no &#8212; which ones crash <a href=\"http:\/\/ffmpeg.org\/\">FFmpeg<\/a>.<\/p>\n<p>So we got this <a href=\"http:\/\/multimedia.cx\/eggs\/i-heart-picsearch-and-python\/\">lengthy list of files from Picsearch<\/a> that crash FFmpeg, or were known to do so circa mid-2007. Now that I have downloaded as many as are still accessible (about 4400), we need to know which files still crash or otherwise exit FFmpeg with a non-zero return code. You&#8217;ll be happy to know that I at least know enough shell scripting to pull off a naive solution for this: <!--more--><\/p>\n<pre>\r\nfor file in *\r\ndo \r\n  ffmpeg -i \\\"$file\\\" -f framecrc - &gt; \/dev\/null 2&gt;&1 \r\n  echo $? \\\"$file\\\" \r\ndone &gt; errors.txt\r\n<\/pre>\n<p>Feed each file into FFmpeg, ignoring both stdout and stderr, and dump the return code and the filename to errors.txt. Seems straightforward enough. Except, perhaps if a particular file seems to cause FFmpeg to stick in an infinite loop. My first idea at that point was to break the script and investigate further&#8230; oops, now I have to do the whole list again. Instead, how about dumping all the filenames to a file and then piping them through a &#8220;while read&#8221; conditional construct so that I can break the script, kill the offending file, delete the processed files from the list, and restart. Although I realized it also works to simply &#8220;killall -9 ffmpeg&#8221; when I have assessed that a process has been running too long.<\/p>\n<p>Unfortunately, that still requires a degree of babysitting. Further, I could never seem to get more than about halfway through the samples before Mac OS X (which was hosting this process) stalled on me. Maybe it crashed, but now that I think about it, the OS was probably just becoming extremely unresponsive due to an errant file causing FFmpeg to leak as much memory, including virtual, that it could. So I needed to reboot the Mac, after which I didn&#8217;t feel I could trust the contents of errors.txt due to the nature of OS file caching.<\/p>\n<p>So my solution? Enter <strong>Python<\/strong> assisted by a guaranteed-synced database named <strong>SQLite<\/strong>. Look, it&#8217;s hard to resist such a robust solution when Python and SQLite make it so easy to program. Go ahead, you stupid samples&#8211; take down my computer using FFmpeg as your attack vector. At least I am sure that the results prior to the crash are accurate. Further, I can break the script and resume it exactly where it left off.<\/p>\n<p>The script follows. For good measure, I decided to run it confined in a Linux VMware session to mitigate damage to my main machine. Also, I disabled swap in the VMware session so that FFmpeg can&#8217;t go overboard on the memory. I lifted and modified the run_process method from my <a href=\"http:\/\/fate.multimedia.cx\/\">FATE<\/a> scripts. It&#8217;s a little overkill for this purpose, but it works and easily allows me to kill a process after a minute. Something I learned about FFmpeg thanks to this exercise: When I execute a new process with FFmpeg inside, for some reason, FFmpeg actually runs in a separate child process (pid = one more than I expected). My original run_process method tries to kill the child a little more responsibly using the pid. The &#8220;killall -9 ffmpeg&#8221; is an obvious hack when I noticed the pid discrepancies.<\/p>\n<p><strong>test-sample-directory-with-ffmpeg.py:<\/strong><br \/>\n<script src=\"https:\/\/gist.github.com\/multimediamike\/4632e8b8faac2b2a11d4791e00afaa99.js\"><\/script><\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to outsmart and ultimately beat a list of errant multimedia samples into submission using Python and SQLite.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[287,285],"class_list":["post-690","post","type-post","status-publish","format-standard","hentry","category-python","tag-fate-server","tag-python"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/690","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/comments?post=690"}],"version-history":[{"count":22,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/690\/revisions"}],"predecessor-version":[{"id":4586,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/690\/revisions\/4586"}],"wp:attachment":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/media?parent=690"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/categories?post=690"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/tags?post=690"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}