{"id":3041,"date":"2010-11-24T19:24:12","date_gmt":"2010-11-25T03:24:12","guid":{"rendered":"http:\/\/multimedia.cx\/eggs\/?p=3041"},"modified":"2010-11-24T19:24:12","modified_gmt":"2010-11-25T03:24:12","slug":"greed-is-good-greed-works","status":"publish","type":"post","link":"https:\/\/multimedia.cx\/eggs\/greed-is-good-greed-works\/","title":{"rendered":"Greed is Good; Greed Works"},"content":{"rendered":"<p><a href=\"http:\/\/en.wikipedia.org\/wiki\/Gordon_Gekko\"><em>Greed, for lack of a better word, is good; Greed works.<\/em><\/a> Well, most of the time. Maybe.<\/p>\n<p><strong>Picking Prediction Modes<\/strong><br \/>\nVP8 uses one of 4 prediction modes to predict a 16&#215;16 luma block or 8&#215;8 chroma block before processing it (for luma, a block can also be broken into 16 4&#215;4 blocks for individual prediction using even more modes).<\/p>\n<p>So, how to pick the best predictor mode? I had no idea when I started writing my VP8 encoder. I did not read any literature on the matter; I just sat down and thought of a brute-force approach. According to the comments in my code:<\/p>\n<pre>\r\n\/\/ naive, greedy algorithm:\r\n\/\/   residual = source - predictor\r\n\/\/   mean = mean(residual)\r\n\/\/   residual -= mean\r\n\/\/   find the max diff between the mean and the residual\r\n\/\/ the thinking is that, post-prediction, the best block will\r\n\/\/ be comprised of similar samples\r\n<\/pre>\n<p>After removing the predictor from the macroblock, individual 4&#215;4 subblocks are put through a forward DCT and quantized. Optimal compression in this scenario results when all samples are the same since only the DC coefficient will be non-zero. Failing that, when the input samples are at least similar to each other, few of the AC coefficients will be non-zero, which helps compression. When the samples are all over the scale, there aren&#8217;t a whole lot of non-zero coefficients unless you crank up the quantizer, which results in poor quality in the reconstructed subblocks.<\/p>\n<p>Thus, my goal was to pick a prediction mode that, when applied to the input block, resulted in a residual in which each element would feature the least deviation from the mean of the residual (relative to other prediction choices).<\/p>\n<p><strong>Greedy Approach<\/strong><br \/>\nI realized that this algorithm falls into the broad general category of <a href=\"http:\/\/en.wikipedia.org\/wiki\/Greedy_algorithm\">&#8220;greedy&#8221; algorithms<\/a>&#8212; one that makes locally optimal decisions at each stage. There are most likely smarter algorithms. But this one was good enough for making an encoder that just barely works.<\/p>\n<p><strong>Compression Results<\/strong><br \/>\nI checked the total file compression size on my usual 640&#215;360 Big Buck Bunny logo image while forcing prediction modes vs. using my greedy prediction picking algorithm. In this very simple test, DC-only actually resulted in slightly better compression than the greedy algorithm (which says nothing about overall quality).<\/p>\n<table border=\"1\" cellpadding=\"3\">\n<tr>\n<th>prediction mode<\/th>\n<th>quantizer index = 0 (minimum)<\/th>\n<th>quantizer index = 10<\/th>\n<\/tr>\n<tr>\n<td>greedy<\/td>\n<td>286260<\/td>\n<td>98028<\/td>\n<\/tr>\n<tr>\n<td>DC<\/td>\n<td>280593<\/td>\n<td>95378<\/td>\n<\/tr>\n<tr>\n<td>vertical<\/td>\n<td>297206<\/td>\n<td>105316<\/td>\n<\/tr>\n<tr>\n<td>horizontal<\/td>\n<td>295357<\/td>\n<td>104185<\/td>\n<\/tr>\n<tr>\n<td>TrueMotion<\/td>\n<td>311660<\/td>\n<td>113480<\/td>\n<\/tr>\n<\/table>\n<p>As another data point, in both quantizer cases, my greedy algorithm selected a healthy mix of prediction modes:<\/p>\n<ul>\n<li>quantizer index 0: DC = 521, VERT = 151, HORIZ = 183, TM = 65<\/li>\n<li>quantizer index 10: DC = 486, VERT = 167, HORIZ = 190, TM = 77<\/li>\n<\/ul>\n<p><strong>Size vs. Quality<\/strong><br \/>\nAgain, note that this ad-hoc test only measures one property (a highly objective one)&#8211; compression size. It did not account for quality which is a far more controversial topic that I have yet to wade into.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Producing a greedy, naive algorithm for choosing intra prediction modes in a VP8 video encoder<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[219],"tags":[],"class_list":["post-3041","post","type-post","status-publish","format-standard","hentry","category-vp8"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/3041","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/comments?post=3041"}],"version-history":[{"count":7,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/3041\/revisions"}],"predecessor-version":[{"id":3048,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/3041\/revisions\/3048"}],"wp:attachment":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/media?parent=3041"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/categories?post=3041"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/tags?post=3041"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}