Monthly Archives: March 2009

Intel Beats Up GCC

Executive Summary: Showcased by FFmpeg, Intel’s C Compiler beats gcc’s C compiler. Handily. Decisively. I stop just short of brutal dismemberment metaphors because that just seems so tasteless, and because I know there must be options to explore in order to improve gcc’s numbers.

Pursuant to my last post where I found results all over the map when comparing FFmpeg‘s performance when built with different compilers, with Intel’s C compiler (icc) barely edging out gcc 4.1.2, David Conrad recommended that I try building FFmpeg with all ASM and manual SIMD optimizations disabled. In doing so, the compilers would have a chance to really shine in optimizing plain C code for a computationally intensive — not to mention commonplace — task. And so I repeated the same test, only I configured the builds with these options:

./configure --disable-yasm --disable-mmx \ 
--disable-mmx2 --disable-sse --disable-ssse3

I also built static binaries with no swscale, if that makes any difference. After each build, I manually audited the resulting binary using the command:

objdump -d ffmpeg_g | grep movq

This method is predicated on the observation that x86 SIMD code blocks nearly always involve at least a movq (move quadword) instruction.

Then I did 2 runs back to back with each build. The results are thus:


icc savagely destroys gcc

It’s interesting to note that icc’s build tested positive for movq instructions– they appear to be generated by the compiler, not present due to FFmpeg code. If the compiler was smart enough to build a binary that uses SIMD where appropriate, I count that as fair game for this exercise. Note that I didn’t specify any specific CPU type to icc. Meanwhile, the optimization level for the gcc builds is cranked up to -O3 (same with icc).

I’m eager to hear how gcc’s numbers might be improved in this case (especially for the latest gcc versions). For reference, every one of these gcc compiler versions was built from source by me. Did I neglect to configure with some –turbo option? Also, fairness dictates that I field suggestions about how to coax icc into building an FFmpeg binary that further embarrasses gcc — and, by extension, free software — in this matter.

Hey, wanna hear something really creepy? Just as I was finishing this post, an apparently automated email arrived from Intel, asking for my feedback on icc.

See Also:

Compiler Performance Profiling With FFmpeg

A Slashdot story named High Performance Linux Kernel Project discusses an effort — called LinuxDNA — to make the Linux kernel compilable with Intel’s C compiler (icc) in an effort to hopefully create a higher performing Linux kernel. The ensuing comments showcased a lot of back and forth about whether icc actually offers any performance gains over recent (or even ancient) gcc versions (“ICC really no longer has the performance lead that it once did over gcc.”). I was also curious about the claim that “it’s well known that gcc 2.95.3 generates much better code on a lot of platforms”; is that why we continually test 2.95.3 via FATE?

Yet another comment stated that, “We tried ICC on our simulator. The result: 8% slower than GCC. On intel chips.”. That’s when I realized that I’m in a position to offer some controlled testing using a CPU-intensive application: FFmpeg. At any given time, I have access to the latest builds compiled for 20 different configurations. This includes a copy built with icc 10.1.017.

So I ran some tests. Executive summary: icc finishes neck and neck with gcc 4.1.2 (a tiny bit ahead of gcc in my test), while both put most of the rest of the compilers in the test to shame, especially the latest gcc compilers. I have a chart to back up my claims, so there:

Followup: Be sure to see the results of this same exercise run without any manual ASM/SIMD optimizations.

Compiler performance when decoding MPEG-4 video and MP3 audio with FFmpeg

Small aside: I hope you appreciate that chart. You wouldn’t believe how long it took me to coerce OpenOffice.org to create it, nor how grotesquely volatile OOo 3 is on Mac OS X. In the end, the program didn’t play ball and I had to use Mac’s screenshot feature to capture the goods for publishing.

Methodology: I took a 104-minute movie that has been encoded with ISO MPEG-4 part 2 (a.k.a. DivX/XviD) video and MP3 audio and fed it through the following command:

$ time ffmpeg -i file.avi -f framecrc – > /dev/null

I used the ‘user’ output from the time prefix (out of the real, user, and sys times) which counts the approximate seconds that the process spent on the CPU. This should exclude I/O access and, really, probably just counts the number of 10ms time slices that the OS allocates to the process. I ran the test once for each compiler configuration, then ran through the configurations a second time and graphed the minimum time between the pair of runs for each configuration.

One day, I will have graphing working in FATE so that we can obtain continuous and historical performance data that will help us analyze trends, both in FFmpeg and in the compilers that build it.

Another comment from the Slashdot thread asserted that “it is simply healthy for the kernel to be compilable across more compilers,” to which another commenter challenged, “Prove it.” Again, I think I’m in a position to help here. While it may be more common for a test to break on all PowerPC configurations due to endian considerations, or for the build to break on the icc or gcc 2.95.3 configurations for reasons related to C99 arcana, there have been a few instances where FATE tests have inexplicably broken on very specific configurations. The latest example of this is when a recent code change in FFmpeg randomly caused the wc3movie-xan test spec to fail, but only on the Linux / x86_32 / gcc 4.2.4 configuration. Huh? Well, thanks to Vitor who promptly went to work with valgrind and found that the subsystem was doing some bad things in the first place and in a way that finally manifested on one configuration. (Incidentally, I’m pretty sure that the WC3 playback system was the first bit of code I ever contributed to FFmpeg.)

See Also: