Tag Archives: icc

Compiler Performance Profiling With FFmpeg

A Slashdot story named High Performance Linux Kernel Project discusses an effort — called LinuxDNA — to make the Linux kernel compilable with Intel’s C compiler (icc) in an effort to hopefully create a higher performing Linux kernel. The ensuing comments showcased a lot of back and forth about whether icc actually offers any performance gains over recent (or even ancient) gcc versions (“ICC really no longer has the performance lead that it once did over gcc.”). I was also curious about the claim that “it’s well known that gcc 2.95.3 generates much better code on a lot of platforms”; is that why we continually test 2.95.3 via FATE?

Yet another comment stated that, “We tried ICC on our simulator. The result: 8% slower than GCC. On intel chips.”. That’s when I realized that I’m in a position to offer some controlled testing using a CPU-intensive application: FFmpeg. At any given time, I have access to the latest builds compiled for 20 different configurations. This includes a copy built with icc 10.1.017.

So I ran some tests. Executive summary: icc finishes neck and neck with gcc 4.1.2 (a tiny bit ahead of gcc in my test), while both put most of the rest of the compilers in the test to shame, especially the latest gcc compilers. I have a chart to back up my claims, so there:

Followup: Be sure to see the results of this same exercise run without any manual ASM/SIMD optimizations.

Compiler performance when decoding MPEG-4 video and MP3 audio with FFmpeg

Small aside: I hope you appreciate that chart. You wouldn’t believe how long it took me to coerce OpenOffice.org to create it, nor how grotesquely volatile OOo 3 is on Mac OS X. In the end, the program didn’t play ball and I had to use Mac’s screenshot feature to capture the goods for publishing.

Methodology: I took a 104-minute movie that has been encoded with ISO MPEG-4 part 2 (a.k.a. DivX/XviD) video and MP3 audio and fed it through the following command:

$ time ffmpeg -i file.avi -f framecrc – > /dev/null

I used the ‘user’ output from the time prefix (out of the real, user, and sys times) which counts the approximate seconds that the process spent on the CPU. This should exclude I/O access and, really, probably just counts the number of 10ms time slices that the OS allocates to the process. I ran the test once for each compiler configuration, then ran through the configurations a second time and graphed the minimum time between the pair of runs for each configuration.

One day, I will have graphing working in FATE so that we can obtain continuous and historical performance data that will help us analyze trends, both in FFmpeg and in the compilers that build it.

Another comment from the Slashdot thread asserted that “it is simply healthy for the kernel to be compilable across more compilers,” to which another commenter challenged, “Prove it.” Again, I think I’m in a position to help here. While it may be more common for a test to break on all PowerPC configurations due to endian considerations, or for the build to break on the icc or gcc 2.95.3 configurations for reasons related to C99 arcana, there have been a few instances where FATE tests have inexplicably broken on very specific configurations. The latest example of this is when a recent code change in FFmpeg randomly caused the wc3movie-xan test spec to fail, but only on the Linux / x86_32 / gcc 4.2.4 configuration. Huh? Well, thanks to Vitor who promptly went to work with valgrind and found that the subsystem was doing some bad things in the first place and in a way that finally manifested on one configuration. (Incidentally, I’m pretty sure that the WC3 playback system was the first bit of code I ever contributed to FFmpeg.)

See Also:

About That 32-bit Chroot

Pursuant to my earlier frustration with building and running 32-bit binaries on a 64-bit Linux installation, I have returned to the chroot suggestions set forth in the comments section for that post. I found the DebootstrapChroot HOWTO on the Ubuntu Wiki which seems to be a fairly authoritative solution. Except that it didn’t work right for the many times I was trying to set it up some weeks ago.

I finally got the 32-bit chroot to work tonight. Thus, I am working to migrate the 8 x86_32 configurations over from the VMware machine. All of the gcc versions work when transplanted directly (2.95.3, though, is only happy living in the same path where it originally resided). I am rebuilding a new version of gcc-svn for x86_32 (no reason to migrate an old version when I am constantly updating from gcc SVN anyway). I could probably migrate the Intel C compiler wholesale, but it would probably be better to take this opportunity to finally upgrade from .15 to .17 in the 10.1 series, at least until Carl Eugen Hoyos gives the all-clear to upgrade to the later series (last I heard, it can’t handle the FFmpeg source).

For my future reference, as well as for the benefit of other confused Ubuntu users, I am documenting how I managed to set up the 32-bit chroot environment. I started with the instructions at https://wiki.ubuntu.com/DebootstrapChroot and found them to be mostly accurate but in the wrong order in some places. Mostly, it had to do with the mount points and when to activate them. The original Wiki describes chrooting as root, performing a bunch of apt-get package maintenance (section: “Setting up your chroot with debootstrap”), and only sometime later (section: “Getting stuff (…) working automagically”) setting up special mount points. Then later on (section: “Setting up a dchroot (non-root) environment”), the document recommends replacing the earlier mount points with a new set. Following those instructions in that order always left me with a confused and corrupted chroot setup. I solved the problem by setting up the second set of mount points before performing the initial package maintenance (and never using the first set).

Also, I found it very useful and bandwidth-saving to make a backup copy (‘cp -a /var/chroot/intrepid /var/chroot/fresh.intrepid’) after the cursory debootstrap command. Just in case, you know, it doesn’t work out quite right the first, second, … tenth time, you won’t have to wipe the chroot directory and download all the packages again from scratch.