{"id":2133,"date":"2010-01-26T07:53:49","date_gmt":"2010-01-26T15:53:49","guid":{"rendered":"http:\/\/multimedia.cx\/eggs\/?p=2133"},"modified":"2010-01-26T07:53:49","modified_gmt":"2010-01-26T15:53:49","slug":"systematic-benchmarking-adjunct-to-fate","status":"publish","type":"post","link":"https:\/\/multimedia.cx\/eggs\/systematic-benchmarking-adjunct-to-fate\/","title":{"rendered":"Systematic Benchmarking Adjunct to FATE"},"content":{"rendered":"<p><a href=\"http:\/\/multimedia.cx\/eggs\/benchmark-bitch-no-more\/\">Pursuant to my rant<\/a> on the futility of comparing, performance-wise, the output of various compilers, I wholly acknowledge the utility of systematically benchmarking <a href=\"http:\/\/ffmpeg.org\/\">FFmpeg<\/a>. <a href=\"http:\/\/fate.multimedia.cx\/\">FATE<\/a> is not an appropriate mechanism for doing so, at least not in its normal mode of operation. The &#8220;normal mode&#8221; would have each of every configuration (60 or so) running certain extended test specs during every cycle. Quite a waste.<\/p>\n<p><strong>Hypothesis:<\/strong> By tracking the performance of a single x86_64 configuration, we should be able to catch performance regressions in FFmpeg.<\/p>\n<p><strong>Proposed methodology:<\/strong> Create a new script that watches for SVN commits. For each and every commit (no skipping), check out the code, build it, and run a series of longer tests. Log the results and move on to the next revision.<\/p>\n<p>What compiler to use? I&#8217;m thinking about using gcc 4.2.4 for this. In my (now abandoned) controlled benchmarks, it was the worst performer by a notable margin. I&#8217;m thinking that the low performance might help to accentuate performance regressions. Is this a plausible theory? 2 years of testing via FATE haven&#8217;t revealed any other major problems with this version.<\/p>\n<p>What kind of samples to test? Thankfully, <a href=\"http:\/\/www.bigbuckbunny.org\/index.php\/download\/\">Big Buck Bunny is available in 4 common formats<\/a>:<\/p>\n<ul>\n<li>MP4\/MPEG-4 part 2 video\/AC3 audio<\/li>\n<li>MP4\/H.264 video\/AAC audio<\/li>\n<li>Ogg\/Theora video\/Vorbis audio<\/li>\n<li>AVI\/MS MPEG-4 video\/MP3 audio<\/li>\n<\/ul>\n<p>I have the 1080p versions of all those files, though I&#8217;m not sure if it&#8217;s necessary to decode all 10 minutes of each. It depends on what kind of hardware I select to run this on.<\/p>\n<p>Further, I may wish to rip an entire audio CD as a single track, encode it with MP3, Vorbis, AAC, WMA, FLAC, and ALAC, and decode each of those.<\/p>\n<p>What other common formats would be useful to track? Note that I only wish to benchmark decoding. My reasoning for this is that decoding should, on the whole, only ever get faster, never slower. Encoding might justifiably get slower as algorithmic trade-offs are made.<\/p>\n<p>I&#8217;m torn on the matter of whether to validate the decoding output during the benchmarking test. The case against validation says that computing framecrc&#8217;s is going to impact the overall benchmarking process; further, validation is redundant since that&#8217;s FATE&#8217;s main job. The case for validation says that since this will always be run on the same configuration, there is no need to worry about off-by-1 rounding issues; further, if a validation fails, that data point can be scrapped (which will also happen if a build fails) and will not count towards the overall trend. An errant build could throw off the performance data. Back on the &#8216;against&#8217; side, that&#8217;s exactly what statistical methods like <a href=\"http:\/\/multimedia.cx\/eggs\/weighted-moving-averages\/\">weighted moving averages<\/a> are supposed to help smooth out.<\/p>\n<p>I&#8217;m hoping that graphing this idea for all to see will be made trivial thanks do <a href=\"http:\/\/code.google.com\/apis\/visualization\/documentation\/gallery.html\">Google&#8217;s Visualization API<\/a>.<\/p>\n<p>The script would run continuously, waiting for new SVN commits. When it&#8217;s not busy with new code, it would work backwards through FFmpeg&#8217;s history to backfill performance data.<\/p>\n<p><strong>So, does this whole idea hold water?<\/strong><\/p>\n<p>If I really want to run this on every single commit, I&#8217;m going to have to do a little analysis to determine a reasonable average number of FFmpeg SVN commits per day over the past year and perhaps what the rate of change is (I&#8217;m almost certain the rate of commits has been increasing). If anyone would like to take on that task, that would be a useful exercise (&#8216;svn log&#8217;, some text manipulation tools, and a spreadsheet should do the trick; you could even put it in a Google Spreadsheet and post a comment with a link to the published document).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Pursuant to my rant on the futility of comparing, performance-wise, the output of various compilers, I wholly acknowledge the utility of systematically benchmarking FFmpeg. FATE is not an appropriate mechanism for doing so, at least not in its normal mode of operation. The &#8220;normal mode&#8221; would have each of every configuration (60 or so) running [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[101],"tags":[],"class_list":["post-2133","post","type-post","status-publish","format-standard","hentry","category-fate-server"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/2133","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/comments?post=2133"}],"version-history":[{"count":2,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/2133\/revisions"}],"predecessor-version":[{"id":2135,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/2133\/revisions\/2135"}],"wp:attachment":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/media?parent=2133"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/categories?post=2133"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/tags?post=2133"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}