A Slashdot story named High Performance Linux Kernel Project discusses an effort — called LinuxDNA — to make the Linux kernel compilable with Intel’s C compiler (icc) in an effort to hopefully create a higher performing Linux kernel. The ensuing comments showcased a lot of back and forth about whether icc actually offers any performance gains over recent (or even ancient) gcc versions (“ICC really no longer has the performance lead that it once did over gcc.”). I was also curious about the claim that “it’s well known that gcc 2.95.3 generates much better code on a lot of platforms”; is that why we continually test 2.95.3 via FATE?
Yet another comment stated that, “We tried ICC on our simulator. The result: 8% slower than GCC. On intel chips.”. That’s when I realized that I’m in a position to offer some controlled testing using a CPU-intensive application: FFmpeg. At any given time, I have access to the latest builds compiled for 20 different configurations. This includes a copy built with icc 10.1.017.
So I ran some tests. Executive summary: icc finishes neck and neck with gcc 4.1.2 (a tiny bit ahead of gcc in my test), while both put most of the rest of the compilers in the test to shame, especially the latest gcc compilers. I have a chart to back up my claims, so there:
Followup: Be sure to see the results of this same exercise run without any manual ASM/SIMD optimizations.
Small aside: I hope you appreciate that chart. You wouldn’t believe how long it took me to coerce OpenOffice.org to create it, nor how grotesquely volatile OOo 3 is on Mac OS X. In the end, the program didn’t play ball and I had to use Mac’s screenshot feature to capture the goods for publishing.
Methodology: I took a 104-minute movie that has been encoded with ISO MPEG-4 part 2 (a.k.a. DivX/XviD) video and MP3 audio and fed it through the following command:
$ time ffmpeg -i file.avi -f framecrc – > /dev/null
I used the ‘user’ output from the time prefix (out of the real, user, and sys times) which counts the approximate seconds that the process spent on the CPU. This should exclude I/O access and, really, probably just counts the number of 10ms time slices that the OS allocates to the process. I ran the test once for each compiler configuration, then ran through the configurations a second time and graphed the minimum time between the pair of runs for each configuration.
One day, I will have graphing working in FATE so that we can obtain continuous and historical performance data that will help us analyze trends, both in FFmpeg and in the compilers that build it.
Another comment from the Slashdot thread asserted that “it is simply healthy for the kernel to be compilable across more compilers,” to which another commenter challenged, “Prove it.” Again, I think I’m in a position to help here. While it may be more common for a test to break on all PowerPC configurations due to endian considerations, or for the build to break on the icc or gcc 2.95.3 configurations for reasons related to C99 arcana, there have been a few instances where FATE tests have inexplicably broken on very specific configurations. The latest example of this is when a recent code change in FFmpeg randomly caused the wc3movie-xan test spec to fail, but only on the Linux / x86_32 / gcc 4.2.4 configuration. Huh? Well, thanks to Vitor who promptly went to work with valgrind and found that the subsystem was doing some bad things in the first place and in a way that finally manifested on one configuration. (Incidentally, I’m pretty sure that the WC3 playback system was the first bit of code I ever contributed to FFmpeg.)