{"id":3822,"date":"2012-05-30T23:26:23","date_gmt":"2012-05-31T06:26:23","guid":{"rendered":"http:\/\/multimedia.cx\/eggs\/?p=3822"},"modified":"2012-05-31T08:52:55","modified_gmt":"2012-05-31T15:52:55","slug":"rar-is-still-a-contender","status":"publish","type":"post","link":"https:\/\/multimedia.cx\/eggs\/rar-is-still-a-contender\/","title":{"rendered":"RAR Is Still A Contender"},"content":{"rendered":"<p><a href=\"http:\/\/en.wikipedia.org\/wiki\/RAR\">RAR (Roshal ARchive)<\/a> is still a popular format in some corners of the internet. In fact, I procured a set of nearly 1500 RAR files that I want to use in a little project. But I didn&#8217;t want my program to have to operate directly on the RAR files which meant that I would need to recompress them to another format. Surely, one of the usual lossless compressors commonplace with Linux these days would perform better. Probably not gzip. Maybe not bzip2 either. Perhaps xz, though?<\/p>\n<p><strong>Conclusion<\/strong><br \/>\nAt first, I concluded that xz beat RAR on every single file in the corpus. But then I studied the comparison again and realized it wasn&#8217;t quite apples to apples. So I designed a new experiment.<\/p>\n<p>New conclusion: RAR still beats xz on every sample in this corpus (for the record, the data could be described as executable program data mixed with reduced quality PCM audio samples).<\/p>\n<p><strong>Methodology<\/strong><br \/>\nMy experiment involved first reprocessing the archive files into a new resource archive file format and only compressing that file (rather than a set of files) using gzip, bzip2, xz, and rar at the maximum compression settings. <\/p>\n<pre>\r\necho filesize,gzip,bzip2,xz,rar,filename > compressed-sizes.csv\r\nfor f in `ls \/path\/to\/files\/*`\r\ndo\r\n  gzip -9 --stdout $f > out.gz\r\n  bzip2 -9 --stdout $f > out.bz2\r\n  xz -9 --stdout --check=crc32 $f > out.xz\r\n  rar a -m5 out.rar $f\r\n  stat --printf \"%s,\" $f out.gz out.bz2 out.rar out.xz >> compressed-sizes.csv\r\n  echo $f >> compressed-sizes.csv\r\n  rm -f out.gz out.bz2 out.xz out.rar\r\ndone\r\n<\/pre>\n<p>Note that xz gets the option <code>'--check=crc32'<\/code> since I&#8217;m using the <a href=\"http:\/\/tukaani.org\/xz\/embedded.html\">XZ Embedded<\/a> library which requires it. It really doesn&#8217;t make a huge different in filesize.<\/p>\n<p><strong>Experimental Results<\/strong><br \/>\nThe preceding command line generates compressed-sizes.csv which goes into <a href=\"https:\/\/docs.google.com\/spreadsheet\/pub?key=0AjHexWy1UYqidGRNd09xb0lhcUxId05FVlFLZm9zcmc&#038;output=html\">a Google Spreadsheet<\/a> (<a href=\"https:\/\/docs.google.com\/spreadsheet\/pub?key=0AjHexWy1UYqidGRNd09xb0lhcUxId05FVlFLZm9zcmc&#038;output=csv\">export as CSV<\/a>).<\/p>\n<p>Here are the full results of the bake-off, graphed:<\/p>\n<p><center><br \/>\n<img decoding=\"async\" src=\"https:\/\/docs.google.com\/spreadsheet\/oimg?key=0AjHexWy1UYqidGRNd09xb0lhcUxId05FVlFLZm9zcmc&#038;oid=2&#038;zx=9gun4u1mp2j2\" \/><br \/>\n<\/center><\/p>\n<p>That&#8217;s not especially useful. Here are the top 2 contenders compared directly:<\/p>\n<p><center><br \/>\n<img decoding=\"async\" src=\"https:\/\/docs.google.com\/spreadsheet\/oimg?key=0AjHexWy1UYqidGRNd09xb0lhcUxId05FVlFLZm9zcmc&#038;oid=8&#038;zx=2tg878h3sp\" \/><br \/>\n<\/center><\/p>\n<p><strong>Action<\/strong><br \/>\nObviously, I&#8217;m unmoved by the data. There is no way I&#8217;m leaving these files in their RAR form for this project, marginal space and bandwidth savings be darned. There are other trade-offs in play here. I know there is free source code available for decompressing RAR files but the license wouldn&#8217;t mesh well with GPL source code libraries that form the core of the same project. Plus, the XZ Embedded code is already integrated and painstakingly debugged.<\/p>\n<p>During this little exercise, I learned of a little site called <a href=\"http:\/\/www.maximumcompression.com\/index.html\">Maximum Compression<\/a> which takes experiments like the foregoing to their logical conclusion by comparing over 200 compression programs on a standard data corpus. According to <a href=\"http:\/\/www.maximumcompression.com\/data\/summary_sf.php\">the site&#8217;s summary page<\/a>, there&#8217;s a library called <a href=\"http:\/\/mattmahoney.net\/dc\/\">PAQ8PX<\/a> which posts the best overall scores.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I held an impromptu bake-off between 4 lossless compression algorithms over a corpus of 1500 files; RAR won in size but I&#8217;m still going to use xz<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[218],"tags":[262,52,261,135,260,259],"class_list":["post-3822","post","type-post","status-publish","format-standard","hentry","category-science-projects","tag-bzip2","tag-compression","tag-gzip","tag-lossless","tag-rar","tag-xz"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/3822","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/comments?post=3822"}],"version-history":[{"count":12,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/3822\/revisions"}],"predecessor-version":[{"id":3834,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/3822\/revisions\/3834"}],"wp:attachment":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/media?parent=3822"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/categories?post=3822"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/tags?post=3822"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}