Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


A Better Process Runner

December 31st, 2010 by Multimedia Mike

I was recently processing a huge corpus of data. It went like this: For each file in a large set, run 'cmdline-tool <file>', capture the output and log results to a database, including whether the tool crashed. I wrote it in Python. I have done this exact type of the thing enough times in Python that I’m starting to notice a pattern.

Every time I start writing such a program, I always begin with using Python’s commands module because it’s the easiest thing to do. Then I always have to abandon the module when I remember the hard way that whatever ‘cmdline-tool’ is, it might run errant and try to execute forever. That’s when I import (rather, copy over) my process runner from FATE, the one that is able to kill a process after it has been running too long. I have used this module enough times that I wonder if I should spin it off into a new Python module.

Or maybe I’m going about this the wrong way. Perhaps when the data set reaches a certain size, I’m really supposed to throw it on some kind of distributed cluster rather than task it to a Python script (a multithreaded one, to be sure, but one that runs on a single machine). Running the job on a distributed architecture wouldn’t obviate the need for such early termination. But hopefully, such architectures already have that functionality built in. It’s something to research in the new year.

I guess there are also process limits, enforced by the shell. I don’t think I have ever gotten those to work correctly, though.

Posted in Python | 4 Comments »

4 Responses

  1. nine Says:

    The simple shell way is:
    command args & ; sleep 3600 ; kill $!
    (in some shells, the first semicolon is a syntax error, but you can use a newline instead to keep it clera)

  2. Multimedia Mike Says:

    @nine: Nice, simple elegant. How about the bit about logging the results to a database? I suppose that could be an implicit part of the command line tool, but that’s sometimes out of scope.

  3. nine Says:

    To log the output, the clear way I’d use is to substitute ‘command’ with a new shell script ‘’:
    command >/tmp/log.$$ 2>&1
    echo $! >> /tmp/

    Then run ` & ; sleep 3600 ; kill $!`

    Of course, you can add proper PID handling now too:
    echo $$ >> $1
    command >/tmp/log.$$ 2>&1
    echo $! >> /tmp/
    rm $1

    ` /tmp/pid.$$ & ; sleep 3600 ; kill $(cat /tmp/pid.$$)`

  4. nine Says:

    echo $! >> /tmp/
    should be:
    echo $? >> /tmp/
    To log the exit status. I just wrote those and the output to a file, doing whatever you want with them is implied.