FATE Process Multi-Runner

At long last, I have created a more efficient version of the FATE script which can leverage multiple CPU threads. After that, I created pretty graphs to demonstrate how much more efficiently multiple cores can operate vs. a singly-threaded testing program.

Background
I was trying to develop a much more ambitious approach to leveraging multiple processors on a machine for the sake of automatically testing FFmpeg. The current method for leveraging, e.g., 2 cores is to create 2 separate FATE installations and divide the configurations between them. My revised idea was to automate that process such that one installation of FATE launched 2 (or n) parallel, singly-threaded build/test operations.

Then Thibaut came along and offered some of his hardware for running FATE cycles. Among these machines is a 6-way Sun Sparc box. While 6 cores may sound impressive, they aren’t especially fast. Thibaut indicated that he would be willing to commit more than 1 core to FATE. That’s when I decided it might be more tractable to run build/test cycles for a particular configuration in a multithreaded manner. Multithreading the build portion is easy– “make -j<n>“. I have decided to put this into service rather than stubbornly sticking to the serial builds. I especially like that this solves a huge problem with my previous brainstorm– I will always be able to count on using ccache, something that would have been complicated by building multiple configurations in parallel due to the necessity of keeping separate source trees.

What about multithreading the test portion? That’s what I’m thinking about in this post. Or was thinking about. I often use these blog posts as scratch pads as I work through ideas. I worked through a lot of concepts to arrive at a simple solution that works well.

How to write a multithreaded tester?
I initially thought I would have to completely revise FATE’s process runner, one piece of the system that I’m quite proud of. The process runner just runs a command line, monitors and collects the stdout and stderr on separate channels, and axes processes that run too long. Sounds simple, I know, but it took awhile to get right. I feared I would have to create a multi-process runner which would launch multiple commands and monitor how long each ran while simultaneously monitoring stdout/stderr channels for each one (this is an essential point as processes will stall if the relatively small IPC pipes carrying this data fill up).

My discovery of Python’s multiprocessing library greatly simplified my design. New concept: Create n tester threads using these multiprocessing facilities and have each one run individual instances of the existing, tested, debugged process runner. Sure, this results in (2 * n + 1) processes, but only n should be doing any heavy lifting at one time.

The design in a picture



The main thread creates one queue for passing test specs and another for passing results. Then it launches the results thread which immediately waits for results to come through the results queue. Then it launches n tester threads which wait for test specs. The main thread then starts stuffing tests in the test queue. When it has run out of tests, it stuffs n thread-terminate signals, one for each tester thread, and then waits for the results thread to finish. The results thread finishes after it receives signals through the results queue that each of the tester threads have finished.

Performance data
I promised pretty graphs and I deliver. I used the new prototype script to run through the existing FATE suite of 350 tests on 2 different machines (the full regression suite wasn’t implemented which is why the remaining tests executed rather quickly).



Graphs generated and served by Google Spreadsheet. Much easier to work with than OpenOffice.



So I quickly hit the point of diminishing returns at (NUM_THREADS / 2). I think that both of these machines are hyperthreaded and I have a feeling that might factor into it.

Other considerations
Right now, none of the individual FATE tests have any dependencies on other tests. This will change going forward. E.g., one test will mux a format and another test will demux it. The demux test needs to be scheduled after the mux test. This gets trickier to do with all of these parallel testing threads but is by no means impossible. My new script is arranged so that the main thread feeds tests into a single queue from which all tester threads consume. Newer versions of the FATE database are going to have information about test dependencies which will allow the main thread to arrange the tests such that a test doesn’t go into the queue if any of its dependencies are not yet satisfied.

For extra credit, the main thread should also determine which tests have dependencies on them and try to load those tests at the front of the queue.

Another small issue right now is that none of my 3 FATE machines presently have Python 2.6 which is a requirement for the multiprocessing library. I suspect this may be a problem for other FATE team members.

5 thoughts on “FATE Process Multi-Runner

  1. nine

    Does the python multiprocessing library fork new python processes? As I understand it the Python interpreter is actually single threaded, so all it *should* be able to offer is fibers.

  2. Multimedia Mike Post author

    Based on the ‘top’ listing, the multiprocessing library is spawning new processes. As the library’s introduction states, “multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.”

  3. Reimar

    fork? I think that word is enough to cause certain Window users to pull out the pitchforks. :-) It’s just horribly slow.
    Either way I don’t really understand why you need threads anyway, you launch a separate ffmpeg process anyway that runs in parallel, for a really stupid solution you could just Popen all tests at once and poll then all in the “while (process.poll() == None):” loop.
    A disadvantage with running multiple test at once is though that the timing data gets a lot less reliable, and with hyperthreading actually it probably becomes meaningless.

  4. Multimedia Mike Post author

    @Michael Sabin: Thanks for the link. Fortunately, my design shouldn’t be affected by this since I explicitly send data around to different processes; I didn’t make any shared-state assumptions since I would only need to use a bunch of locking anyway. Of course I will have to test it on Windows to be sure.

    @Reimar: Not sure whom you’re addressing. My prototype solution is very much multi-process rather than multi-threaded. It creates a static pool of worker processes which each serially spawn a series of FFmpeg processes.

    BTW, your naive Popen/poll approach neglects the draining of the stdout/stderr channels. That’s a big detail. :-)

Comments are closed.