{"id":2586,"date":"2010-06-25T20:47:58","date_gmt":"2010-06-26T03:47:58","guid":{"rendered":"http:\/\/multimedia.cx\/eggs\/?p=2586"},"modified":"2020-07-25T22:50:48","modified_gmt":"2020-07-26T05:50:48","slug":"multiprocess-fate-revisited","status":"publish","type":"post","link":"https:\/\/multimedia.cx\/eggs\/multiprocess-fate-revisited\/","title":{"rendered":"Multiprocess FATE Revisited"},"content":{"rendered":"<p>I thought I had <a href=\"http:\/\/multimedia.cx\/eggs\/fate-process-multi-runner\/\">brainstormed a simple, elegant, multithreaded, deadlock-free refactoring for FATE in a previous post<\/a>. However, I sort of glossed over the test ordering logic which I had not yet prototyped. The grim, possibly deadlock-afflicted reality is that the main thread needs to be notified as tests are completed. So, the main thread sends test specs through a queue to be executed by <em>n<\/em> tester threads and those threads send results to a results aggregator thread. Additionally, the results aggregator will need to send completed test IDs back to the main thread.<\/p>\n<p><center><br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2010\/06\/3-fate-threads-300x259.png\" alt=\"\" title=\"3 FATE threads\" width=\"300\" height=\"259\" class=\"aligncenter size-medium wp-image-2589\" srcset=\"https:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2010\/06\/3-fate-threads-300x259.png 300w, https:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2010\/06\/3-fate-threads.png 304w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><br \/>\n<\/center><\/p>\n<p>But when I step back and look at the graph, I can&#8217;t rationalize why there should be a separate results aggregator thread. That was added to cut down on deadlock possibilities since the main thread and the tester threads would not be waiting for data from each other. Now that I&#8217;ve come to terms with the fact that the main and the testers need to exchange data in realtime, I think I can safely eliminate the result thread. Adding more threads is not the best way to guard against race conditions and deadlocks. Ask <a href=\"http:\/\/xine-project.org\/\">xine<\/a>.<\/p>\n<p><center><br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2010\/06\/2-fate-threads.png\" alt=\"\" title=\"2 FATE threads\" width=\"164\" height=\"264\" class=\"aligncenter size-full wp-image-2590\" \/><br \/>\n<\/center><\/p>\n<p><!--more--><\/p>\n<p>I&#8217;m still hung up on the deadlock issue. I have these queues through which the threads communicate. At issue is the fact that they can cause a thread to block when inserting an item if the queue is &#8220;full&#8221;. How full is full? Immaterial; seeking to answer such a question is not how you guard against race conditions. Rather, it seems to me that one side should be doing non-blocking queue operations.<\/p>\n<p>This is how I&#8217;m planning to revise the logic in the main thread:<\/p>\n<pre>\r\ntest_set = set of all tests to execute\r\ntests_pending = test_set\r\ntests_blocked = empty set\r\ntests_queue = multi-consumer queue to send test specs to tester threads\r\nresults_queue = multi-producer queue through which tester threads send results\r\nwhile there are tests in tests_pending:\r\n  pop a test from test_set\r\n  if test depends on any tests that appear in tests_pending:\r\n    add test to tests_blocked\r\n  else:\r\n    add test to tests_queue in a non-blocking manner\r\n    if tests_queue is full, add test to tests_blocked\r\n\r\n  while there are results in the results_queue:\r\n    get a result from result_queue in non-blocking manner\r\n    remove the corresponding test from tests_pending\r\n\r\nif tests_blocked is non-empty:\r\n  sleep for 1 second\r\n  test_set = tests_blocked\r\n  tests_blocked = empty set\r\nelse:\r\n  insert <em>n<\/em> shutdown signals, one from each thread\r\n\r\ngo to the top of the loop and repeat until there are no more tests\r\n\r\nwhile there are results in the results_queue:\r\n  get a result from result_queue in a blocking manner\r\n<\/pre>\n<p>Not mentioned in the pseudocode (so it doesn&#8217;t get too verbose) is logic to check whether the retrieved test result is actually an end-of-thread signal. These are accounted and the whole test process is done when one is received for each thread.<\/p>\n<p>On the tester thread side, it&#8217;s safe for them to do blocking test queue retrievals and blocking result queue insertions. The reason for the 1-second delay before resetting tests_blocked and looping again is because I want to guard against the situation where tests A and B are to be run, A depends of B running first, and while B is running (and happens to be a long encoding test), the main thread is spinning about, obsessively testing whether it&#8217;s time to insert A into the tests queue.<\/p>\n<p>It all sounds just crazy enough to work. In fact, I coded it up and it does work, sort of. The queue gets blocked pretty quickly. Instead of sleeping, I decided it&#8217;s better to perform the put operation using a 1-second timeout.<\/p>\n<p>Still, I&#8217;m paranoid about the precise operation of the IPC queue mechanism at work here. What happens if I try to stuff in a test spec that&#8217;s a bit too large? Will the module take whatever I give it and serialize it through the queue as soon as it can? I think an impromptu science project is in order.<\/p>\n<p>big-queue.py:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/multimediamike\/bcc228a96bba76e461cd547e7f8fa5e5.js\"><\/script><\/p>\n<pre>\r\n$ .\/big-queue.py \r\nreader function got a string of 100000000 characters\r\n<\/pre>\n<p>Since 100 MB doesn&#8217;t even make it choke, FATE&#8217;s little test specs shouldn&#8217;t pose any difficulty.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>More rambling about how I&#8217;m revising FATE&#8217;s process runner<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[101,55],"tags":[],"class_list":["post-2586","post","type-post","status-publish","format-standard","hentry","category-fate-server","category-python"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/2586","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/comments?post=2586"}],"version-history":[{"count":7,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/2586\/revisions"}],"predecessor-version":[{"id":4602,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/2586\/revisions\/4602"}],"wp:attachment":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/media?parent=2586"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/categories?post=2586"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/tags?post=2586"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}