{"id":1826,"date":"2009-09-19T23:30:04","date_gmt":"2009-09-20T06:30:04","guid":{"rendered":"http:\/\/multimedia.cx\/eggs\/?p=1826"},"modified":"2009-09-22T14:46:48","modified_gmt":"2009-09-22T21:46:48","slug":"optimizing-away-arrows","status":"publish","type":"post","link":"https:\/\/multimedia.cx\/eggs\/optimizing-away-arrows\/","title":{"rendered":"Optimizing Away Arrows"},"content":{"rendered":"<p>Google released the third version of their year-old <a href=\"http:\/\/www.google.com\/chrome\">Chrome browser<\/a> this past week. This reminded me that they incorporate <a href=\"http:\/\/ffmpeg.org\/\">FFmpeg<\/a> into the software (and thanks to the devs for <a href=\"http:\/\/src.chromium.org\/viewvc\/chrome\/trunk\/deps\/third_party\/ffmpeg\/patches\/\">making various fixes available to us<\/a>). Chrome uses FFmpeg for decoding HTML5\/video tag-type video and accompanying audio. This always makes me wonder, <em>why would they use FFmpeg&#8217;s Theora decoder? It sucks. I should know; I wrote it.<\/em><\/p>\n<p>Last year, <a href=\"http:\/\/lists.mplayerhq.hu\/pipermail\/ffmpeg-devel\/2008-June\/047993.html\">Reimar discovered<\/a> that the VP3\/Theora decoder spent the vast majority of its time decoding the coefficient stream. He proposed a fix that made it faster. I got a chance to check out the decoder tonight and profile it with <a href=\"http:\/\/oprofile.sf.net\">OProfile<\/a> and FFmpeg&#8217;s own internal timer facilities. It turns out that the function named unpack_vlcs() is still responsible for 44-50% of the decoding time, depending on machine and sample file. This is mildly disconcerting considering the significant amount of effort I put forth to even make it that fast (it took a lot of VLC magic).<\/p>\n<p>So a function in a multimedia program is slow? Well, throw assembly language and SIMD instructions at the problem! Right? It&#8217;s not that simple with entropy decoders.<\/p>\n<p>Reimar had a good idea in his patch and I took it to its logical conclusion: Optimize away the arrows, i.e., structure dereferences. The function insists on repeatedly grabbing items out of arrays from a context structure. Thus, create local pointers to the same array and save a bunch of dereferences through each of the innumerable iterations.<\/p>\n<p>Results were positive&#8211; both OProfile and the TSC-based internal counter showed notable improvements.<\/p>\n<p>Ideas for further improvements: Multithreading is all the rage for video decoders these days. Unfortunately, entropy decoding is always a serial proposition. However, VP3\/Theora is in a unique position to take advantage of another multithreading opportunity: It could call reverse_dc_prediction() in a separate thread after all the DC coefficients are decoded. Finally, an upside to the algorithm&#8217;s unorthodox bitstream format! According to my OProfile reports, reverse_dc_prediction() consistently takes around 6-7% of the decode time. So it would probably be of benefit to remove that from the primary thread which would be busy with the AC coefficients.<\/p>\n<p>Taking advantage of multiple threads would likely help with the render_slice() function. One thing at a time, though. Wish me luck with presenting the de-dereferencing patch to the list.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Banish dereferences from inner loops<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[28,33],"tags":[],"class_list":["post-1826","post","type-post","status-publish","format-standard","hentry","category-programming","category-vp3theora"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/1826","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/comments?post=1826"}],"version-history":[{"count":8,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/1826\/revisions"}],"predecessor-version":[{"id":1834,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/1826\/revisions\/1834"}],"wp:attachment":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/media?parent=1826"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/categories?post=1826"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/tags?post=1826"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}