{"id":2468,"date":"2010-05-17T21:55:21","date_gmt":"2010-05-18T04:55:21","guid":{"rendered":"http:\/\/multimedia.cx\/eggs\/?p=2468"},"modified":"2010-05-17T21:55:21","modified_gmt":"2010-05-18T04:55:21","slug":"fate-process-multi-runner","status":"publish","type":"post","link":"https:\/\/multimedia.cx\/eggs\/fate-process-multi-runner\/","title":{"rendered":"FATE Process Multi-Runner"},"content":{"rendered":"<p>At long last, I have created a more efficient version of the <a href=\"http:\/\/fate.multimedia.cx\/\">FATE<\/a> script which can leverage multiple CPU threads. After that, I created pretty graphs to demonstrate how much more efficiently multiple cores can operate vs. a singly-threaded testing program.<\/p>\n<p><!--more--><\/p>\n<p><strong>Background<\/strong><br \/>\nI was <a href=\"http:\/\/multimedia.cx\/eggs\/better-parallelization-and-scalability\/\">trying to develop a much more ambitious approach<\/a> to leveraging multiple processors on a machine for the sake of automatically testing <a href=\"http:\/\/ffmpeg.org\">FFmpeg<\/a>. The current method for leveraging, e.g., 2 cores is to create 2 separate FATE installations and divide the configurations between them. My revised idea was to automate that process such that one installation of FATE launched 2 (or <em>n<\/em>) parallel, singly-threaded build\/test operations.<\/p>\n<p>Then <a href=\"http:\/\/multimedia.cx\/eggs\/a-lot-of-new-fate-machines\/\">Thibaut came along and offered some of his hardware<\/a> for running FATE cycles. Among these machines is a 6-way Sun Sparc box. While 6 cores may sound impressive, they aren&#8217;t especially fast. Thibaut indicated that he would be willing to commit more than 1 core to FATE. That&#8217;s when I decided it might be more tractable to run build\/test cycles for a particular configuration in a multithreaded manner. Multithreading the build portion is easy&#8211; &#8220;make -j<em>&lt;n&gt;<\/em>&#8220;. I have decided to put this into service rather than stubbornly sticking to the serial builds. I especially like that this solves a huge problem with my previous brainstorm&#8211; I will always be able to count on using ccache, something that would have been complicated by building multiple configurations in parallel due to the necessity of keeping separate source trees.<\/p>\n<p>What about multithreading the test portion? That&#8217;s what I&#8217;m thinking about in this post. Or <em>was<\/em> thinking about. I often use these blog posts as scratch pads as I work through ideas. I worked through a lot of concepts to arrive at a simple solution that works well.<\/p>\n<p><strong>How to write a multithreaded tester?<\/strong><br \/>\nI initially thought I would have to completely revise FATE&#8217;s process runner, one piece of the system that I&#8217;m quite proud of. The process runner just runs a command line, monitors and collects the stdout and stderr on separate channels, and axes processes that run too long. Sounds simple, I know, but it took awhile to get right. I feared I would have to create a multi-process runner which would launch multiple commands and monitor how long each ran while simultaneously monitoring stdout\/stderr channels for each one (this is an essential point as processes will stall if the relatively small IPC pipes carrying this data fill up).<\/p>\n<p>My discovery of <a href=\"http:\/\/docs.python.org\/library\/multiprocessing.html\">Python&#8217;s multiprocessing library<\/a> greatly simplified my design. New concept: Create <em>n<\/em> tester threads using these multiprocessing facilities and have each one run individual instances of the existing, tested, debugged process runner. Sure, this results in (2 * n + 1) processes, but only <em>n<\/em> should be doing any heavy lifting at one time.<\/p>\n<p><strong>The design in a picture<\/strong><br \/>\n<center><br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2010\/05\/multithreaded-fate.png\" alt=\"\" title=\"Multithreaded FATE program flow\" width=\"471\" height=\"340\" class=\"aligncenter size-full wp-image-2472\" srcset=\"https:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2010\/05\/multithreaded-fate.png 471w, https:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2010\/05\/multithreaded-fate-300x216.png 300w\" sizes=\"auto, (max-width: 471px) 100vw, 471px\" \/><br \/>\n<\/center><\/p>\n<p>The main thread creates one queue for passing test specs and another for passing results. Then it launches the results thread which immediately waits for results to come through the results queue. Then it launches <em>n<\/em> tester threads which wait for test specs. The main thread then starts stuffing tests in the test queue. When it has run out of tests, it stuffs <em>n<\/em> thread-terminate signals, one for each tester thread, and then waits for the results thread to finish. The results thread finishes after it receives signals through the results queue that each of the tester threads have finished.<\/p>\n<p><strong>Performance data<\/strong><br \/>\nI promised pretty graphs and I deliver. I used the new prototype script to run through the existing FATE suite of 350 tests on 2 different machines (the full regression suite wasn&#8217;t implemented which is why the remaining tests executed rather quickly).<\/p>\n<p><center><br \/>\n<img decoding=\"async\" src=\"http:\/\/spreadsheets.google.com\/oimg?key=0AjHexWy1UYqidEdJS0pJZDRtcEhaWmVGRjNaRjNSSVE&#038;oid=1&#038;zx=mmz2jd-zdf2yv\" \/><br \/>\n<\/center><\/p>\n<p>Graphs generated and served by Google Spreadsheet. Much easier to work with than OpenOffice.<\/p>\n<p><center><br \/>\n<img decoding=\"async\" src=\"http:\/\/spreadsheets.google.com\/oimg?key=0AjHexWy1UYqidEdJS0pJZDRtcEhaWmVGRjNaRjNSSVE&#038;oid=2&#038;zx=g1gfjo-tlhxb\" \/><br \/>\n<\/center><\/p>\n<p>So I quickly hit the point of diminishing returns at (NUM_THREADS \/ 2). I think that both of these machines are hyperthreaded and I have a feeling that might factor into it.<\/p>\n<p><strong>Other considerations<\/strong><br \/>\nRight now, none of the individual FATE tests have any dependencies on other tests. This will change going forward. E.g., one test will mux a format and another test will demux it. The demux test needs to be scheduled after the mux test. This gets trickier to do with all of these parallel testing threads but is by no means impossible. My new script is arranged so that the main thread feeds tests into a single queue from which all tester threads consume. Newer versions of the FATE database are going to have information about test dependencies which will allow the main thread to arrange the tests such that a test doesn&#8217;t go into the queue if any of its dependencies are not yet satisfied.<\/p>\n<p>For extra credit, the main thread should also determine which tests have dependencies on them and try to load those tests at the front of the queue.<\/p>\n<p>Another small issue right now is that none of my 3 FATE machines presently have Python 2.6 which is a requirement for the multiprocessing library. I suspect this may be a problem for other FATE team members.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>FATE&#8217;s client-side testing component can finally be multi-threaded<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[101],"tags":[],"class_list":["post-2468","post","type-post","status-publish","format-standard","hentry","category-fate-server"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/2468","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/comments?post=2468"}],"version-history":[{"count":7,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/2468\/revisions"}],"predecessor-version":[{"id":2476,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/2468\/revisions\/2476"}],"wp:attachment":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/media?parent=2468"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/categories?post=2468"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/tags?post=2468"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}