Process Runner Redux

Pursuant to yesterday’s conundrum of creating a portable process runner in Python for FATE that can be reliably killed when exceeding time constraints, I settled on a solution. As Raymond Tau reminded us in the ensuing discussion, Python won’t use a shell to launch the process if the program can supply the command and its arguments as a sequence data structure. I knew this but was intentionally avoiding it. It seems like a simple problem to break up a command line into a sequence of arguments– just split on spaces. However, I hope to test metadata options eventually which could include arguments such as ‘-title “Hey, is this thing on?”‘ where splitting on spaces clearly isn’t the right solution.

I got frustrated enough with the problem that I decided to split on spaces anyway. Hey, I control this system from top to bottom, so new rule: No command line arguments in test specs will have spaces with quotes around them. I already enforce the rule that no sample files can have spaces in their filenames since that causes trouble with remote testing. When I get to the part about testing metadata, said metadata will take the form of ‘-title “HeyIsThisThingOn?”‘ (which will then fail to catch myriad bugs related to FFmpeg’s incorrect handling of whitespace in metadata arguments, but this is all about trade-offs).

So the revised Python process runner seems to work correctly on Linux. The hangaround.c program simulates a badly misbehaving program by eating the TERM signal and must be dealt with using the KILL signal. The last line in these examples is a tuple containing return code, stdout, stderr, and CPU time. For Linux:

$ ./upr.py 
['./hangaround', '40']
process ID = 2645
timeout, sending TERM
timeout, really killing
[-9, '', '', 0]

The unmodified code works the same on Mac OS X:

$ ./upr.py
['./hangaround', '40']
process ID = 94866
timeout, sending TERM
timeout, really killing
[-9, '', '', 0]

Now a bigger test: Running the upr.py script on Linux in order to launch the hangaround process remotely on Mac OS X via SSH:

$ ./upr.py 
['/usr/bin/ssh', 'foster-home', './hangaround', '40']
process ID = 2673
timeout, sending TERM
[143, '', '', 50]

So that’s good… sort of. Monitoring the process on the other end reveals that hangaround is still doing just that, even after SSH goes away. This occurs whether or not hangaround is ignoring the TERM signal. This is still suboptimal.

It would be possible to open a separate SSH session to send a TERM or KILL signal to the original process… except that I wouldn’t know the PID of the remote process. Or could I? I’m open to Unix shell magic tricks on this problem since anything responding to SSH requests is probably going to be acceptably Unix-like. I would rather not go the ‘killall ffmpeg’ route because that could interfere with some multiprocessing ideas I’m working on.

Here’s a brute force brainstorm: When operating in remote-SSH mode, prefix the command with ‘ln -s ffmpeg ffmpeg-<unique-key>’ and then execute the symbolic link instead of the main binary. Then the script should be able to open a separate SSH session and execute ‘killall ffmpeg-<unique-key>’ without interfering with other processes. Outlandish but possibly workable.

12 thoughts on “Process Runner Redux

  1. Kostya

    In my experience all SSH commands die after some period of time after closing. Maybe just configuring sshd to lower that period of time for waiting for SSH client should do the trick?

  2. SvdB

    The common way to do this on *nix systems is to write the pid into a file (under /var/run/ for system processes), from a shell wrapper or a process itself.

  3. Multimedia Mike Post author

    @SvdB: That still runs into the problem of possibly wanting to run multiple concurrent tests remotely (multicore ARMs? it could happen) and having the program step on other PID files. However, we could also enforce that remote testing shall only be performed serially.

  4. Reimar

    Well, as for the parsing…
    You could just guess how many arguments you’ll want at most and execute
    echo “$1”
    echo “$2”
    etc.. to get all the argument, in 100% the way the shell would do it.

  5. dionoea

    Couldn’t you just use a small program to let the shell parse your arguments? Like in python “import sys\nprint repr(sys.argv[2:])” and call “python thescript.py ./ffmpeg …” to get your list of arguments which you can then pass to the relevant python function.

  6. Raymond Tau

    Nothing splits shell parameters better then shell.
    #!/bin/sh

    while [ ${#} -gt 0 ]
    do
    echo “${1}”
    shift
    done

  7. SvdB

    @Mike: You could always decide to add a prefix or suffix to the pid file, determined by the supervising process.

    @Reimar, Raymond: You have to look out with “echo”. What if one of the arguments is “-n”? The way I usually do it is either:

    cat << EOF
    $1
    EOF

    or
    printf '%s\n' "$1"
    The former is slightly more portable.

    P.S. @Mike: It would still be nice to have a preview option for posting comments. :D

  8. Multimedia Mike Post author

    @SvdB: Yeah, comment preview would be awesome. Please know that I have looked into it and none of the options I found could be made to work. Silly WordPress plugin ecosystem.

  9. Lars

    Hi there!

    I’m not a python programmer, but I like regular expressions.
    splitting a potential commandline:

    $ python
    >>> import re
    >>> commandline=’this is a textwith “double quotes” “‘
    >>> re.compile(‘([^ “]+|”[^”]*”)’).findall(commandline)
    [‘this’, ‘is’, ‘a’, ‘textwith’, ‘”double quotes”‘]

    I’m not sure if the double quotes should be removed, but this should be easy.

    good night
    Lars

  10. Raymond Tau

    @Lars: Wow, I think that’s quite good. Quote removal should be performed(from bash manpage, “After the preceding expansions, all unquoted occurrences of the characters \, ‘, and ” that did not result from one of the above expansions are removed.”). Furthermore, quote(s) precede with backquote(\) should not be left as is, thus still need to change the regex in some way.

    I feel that’s quite troublesome already, if I were Mike, I’ll leave that to shell, or, if you prefer Python, to Python as dionoea had suggested.

  11. Lars

    yes, dionoea had the best idea.
    anyway here is a new version of my idea. it splits better but escaped quotes are not allowed in this version too
    it seems python doesn’t like nested re:

    $ python
    >>> import re
    >>> line=’this is a te”xt with” spaces, “double quotes” and \’simple quotes\”
    >>> re.compile(‘([^ \'”]*\'[^\’]*\’|[^ \'”]*”[^”]*”|[^\’ “]+)’).findall(line)
    [‘this’, ‘is’, ‘a’, ‘te”xt with”‘, ‘spaces,’, ‘”double quotes”‘, ‘and’, “‘simple quotes'”]

    now just remove the quotes.
    Lars

Comments are closed.