{"id":1179,"date":"2009-03-03T02:15:17","date_gmt":"2009-03-03T10:15:17","guid":{"rendered":"http:\/\/multimedia.cx\/eggs\/?p=1179"},"modified":"2010-01-22T14:42:45","modified_gmt":"2010-01-22T22:42:45","slug":"intel-beats-up-gcc","status":"publish","type":"post","link":"https:\/\/multimedia.cx\/eggs\/intel-beats-up-gcc\/","title":{"rendered":"Intel Beats Up GCC"},"content":{"rendered":"<p><strong>Executive Summary:<\/strong> Showcased by FFmpeg, Intel&#8217;s C Compiler beats gcc&#8217;s C compiler. Handily. Decisively. I stop just short of brutal dismemberment metaphors because that just seems so tasteless, and because I know there must be options to explore in order to improve gcc&#8217;s numbers.<\/p>\n<p>Pursuant to <a href=\"http:\/\/multimedia.cx\/eggs\/compiler-performance-profiling-with-ffmpeg\/\">my last post where I found results all over the map<\/a> when comparing <a href=\"http:\/\/ffmpeg.org\/\">FFmpeg<\/a>&#8216;s performance when built with different compilers, with Intel&#8217;s C compiler (icc) barely edging out gcc 4.1.2, <a href=\"http:\/\/multimedia.cx\/eggs\/compiler-performance-profiling-with-ffmpeg\/#comment-142370\">David Conrad recommended<\/a> that I try building FFmpeg with all ASM and manual SIMD optimizations disabled. In doing so, the compilers would have a chance to really shine in optimizing plain C code for a computationally intensive &#8212; not to mention commonplace &#8212; task. And so I repeated the same test, only I configured the builds with these options:<\/p>\n<pre>\r\n.\/configure --disable-yasm --disable-mmx \\ \r\n--disable-mmx2 --disable-sse --disable-ssse3\r\n<\/pre>\n<p>I also built static binaries with no swscale, if that makes any difference. After each build, I manually audited the resulting binary using the command:<\/p>\n<p>objdump -d ffmpeg_g | grep movq<\/p>\n<p>This method is predicated on the observation that x86 SIMD code blocks nearly always involve at least a movq (move quadword) instruction.<\/p>\n<p>Then I did 2 runs back to back with each build. The results are thus:<\/p>\n<p><center><br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2009\/03\/icc-beats-gcc.png\" alt=\"icc savagely destroys gcc\" title=\"icc savagely destroys gcc\" width=\"478\" height=\"252\" class=\"aligncenter size-full wp-image-1182\" srcset=\"https:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2009\/03\/icc-beats-gcc.png 478w, https:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2009\/03\/icc-beats-gcc-300x158.png 300w\" sizes=\"auto, (max-width: 478px) 100vw, 478px\" \/><br \/>\n<\/center><\/p>\n<p>It&#8217;s interesting to note that icc&#8217;s build tested positive for movq instructions&#8211; they appear to be generated by the compiler, not present due to FFmpeg code. If the compiler was smart enough to build a binary that uses SIMD where appropriate, I count that as fair game for this exercise. Note that I didn&#8217;t specify any specific CPU type to icc. Meanwhile, the optimization level for the gcc builds is cranked up to -O3 (same with icc).<\/p>\n<p>I&#8217;m eager to hear how gcc&#8217;s numbers might be improved in this case (especially for the latest gcc versions). For reference, every one of these gcc compiler versions was built from source by me. Did I neglect to configure with some &#8211;turbo option? Also, fairness dictates that I field suggestions about how to coax icc into building an FFmpeg binary that further embarrasses gcc &#8212; and, by extension, free software &#8212; in this matter.<\/p>\n<p>Hey, wanna hear something really creepy? <em>Just<\/em> as I was finishing this post, an <em>apparently<\/em> automated email arrived from Intel, asking for my feedback on icc.<\/p>\n<p><strong>See Also:<\/strong><\/p>\n<ul>\n<li><a href=\"http:\/\/multimedia.cx\/eggs\/icc-vs-gcc-smackdown-round-3\/\">Next smackdown in the series<\/a><\/li>\n<li><a href=\"http:\/\/multimedia.cx\/eggs\/compiler-performance-profiling-with-ffmpeg\/\">Previous smackdown in the series<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>On the suggestion of a commenter, I repeated my icc vs. gcc experiment and found that icc soundly thrashes gcc<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[28],"tags":[],"class_list":["post-1179","post","type-post","status-publish","format-standard","hentry","category-programming"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/1179","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/comments?post=1179"}],"version-history":[{"count":9,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/1179\/revisions"}],"predecessor-version":[{"id":2105,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/1179\/revisions\/2105"}],"wp:attachment":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/media?parent=1179"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/categories?post=1179"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/tags?post=1179"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}