Tag Archives: icc

Latest Compiler News

I’ve been doing compiler stuff tonight:

  • Thanks to Carl Eugen Hoyos as well as my compiler contact inside Intel for advising me on how to procure icc version 11.0.083 (vs. .081 previously) along with an unlimited, non-commercial compiler license. Looks like I won’t have to worry about the 31-day limit now (though there might still be a problem with the Mac OS X version). Further, the 32-bit compiler runs from the 64-bit kernel prompt (I am trying to move away from the 32-bit chroot setup and am meeting with some success). Both 32- and 64-bit versions are now in FATE.
  • After a brief respite following the release of gcc 4.4.0, I have updated the gcc SVN snapshots and reinstated the configurations tracking the FFmpeg build status on experimental gcc 4.5 builds. I’m especially proud, though, that I managed to build one C compiler binary that runs under 64-bit Linux but can build both 32- and 64-bit binaries.
  • A lot of people wonder about this, so I wanted to make it known that I have been briefed on how to use LLVM, the rising star of compiler technology. Thus far, I have not been able to create a build that compiles FFmpeg. I hear that they’re working on it.

New performance smackdowns to come, at least for those that can currently build FFmpeg (the current rev of gcc 4.5-SVN, rev 147090, isn’t doing so well– failing across platforms).

Performance Smackdown: The Latest in 64-bit From GCC and Intel

Since gcc 4.4.0 has been formally released, it’s time to re-run the compiler output benchmarks. Further, I finally sat down and put my mind toward getting the latest Intel C compiler installed and operational. I met with limited success. I haven’t been able to get the 32-bit compiler working. After the tedious rigmarole of getting version 11.0.081 installed, I launched the program without any parameters:

$ /opt/intel/Compiler/11.0/081/bin/ia32/icc
Segmentation fault

Grrrrr… why do I even bother? Fortunately, the intel64 (x86_64) compiler is operational. At the same time I was grabbing the Linux version, I noticed that there is a Mac OS X version, though it is somewhat down-rev at 11.0.059. I still downloaded that and tried it out. I was able to get it to build 32-bit binaries but not 64-bit.

So the upshot, FATE-wise, is that I have put 11.0.081/Linux/x86_64 and 11.0.059/Mac OS X/x86_32 into the system for continuous building and testing. At the time of this writing, they’re not doing so well. Lots of H.264 tests fail. The regressions pass for the most part, though.

But I stubbornly proceeded with the output benchmarks anyway. This is how the compilers are performing, per my usual method (best time out of 2 runs on the same, long, HD file; no hand-crafted ASM optimizations enabled):

64-bit compiler output performance chart, round 2

The gcc versions demonstrate similar performance to the first round of 64-bit tests. As for the icc 64-bit results, well, I don’t think I need to interpret that for you. I will tell you that I first ran it with no special options. Then I ran it with “–cpu=core2” which improved its run time by about 3 seconds. The gcc configurations used no special options.

However, there is a deeper issue. As indicated by the FATE tests, icc is incorrectly decoding H.264 video. Thanks to the 10-second validation files generated during the benchmarks, I am able to see that, what should look like this (from gcc 4.4.0):

64-bit validation file, generated by gcc 4.4.0

turns out like this (icc 11.0.081):

64-bit validation file, generated by icc 11.0.081

This makes me wonder what is so special about the FFmpeg H.264 decoder that icc has so much trouble digesting it. Is the code especially tricky? Or does it have a lot of tight loops that icc sees as opportunities for (mistaken) vectorization?

Another issue that concerns me regarding this latest series of Intel C compilers: I only have an evaluation license for 31 days. I’m not sure what happens after that. Presumably, I don’t get to use the compiler anymore. However, Intel seems to rev their compiler so often that I wonder if each minor update comes with a 31-day evaluation license.

See Also:

Performance Smackdown, Now With 64-bit

Another in my continuing series of compiler performance reports– that is, the performance of straight C code when compiled by assorted compilers. Pursuant to round 3, I downloaded the long, free, hi-def H.264/AAC movie to profile, as suggested by Reimar and profiled that. It takes 11-15 minutes to decode the entire thing on my 2.13 GHz Core 2. No matter; my machine is patient, and here are the results:

icc vs gcc performance chart when running FFmpeg, round 4

“gcc-svn” is gcc 4.4.0-svn, revision 143046, built on 2009-01-03, same as before.

All validations passed. Further, I used “march=pentium4” as suggested by Flameeyes, on compilers that supported the option but not “march=core2” (gcc 3.4.6, 4.0.4, 4.1.2, and 4.2.4). I think that improved performance for those, but I won’t know for sure unless I run with the original MPEG-4 part 2/MP3 movie from the previous tests.

I also took this opportunity to see how native 64-bit builds performed on the same machine. I hope one day to get Intel’s 64-bit compiler working so it can be included in the competition:

Profiling 64-bit code using FFmpeg

For this test, I didn’t specify any compiler optimizations from the command line. Let me know if that should change for the next round. “gcc-svn” is a little more up to date at gcc 4.4.0-svn, revision 144720, built on 2009-03-08.

Lingering TODO: Investigate if Acovea can help in this process.

See Also:

icc vs. gcc Smackdown, Round 3

How did I become the benchmark peon? Oh right, I actually dared to put forth some solid benchmarks and called for suggestions for possible improvements to the benchmark methodology. This is what I get.

Doing these benchmarks per all the suggestions I have received is time-consuming and error-prone. But if you know anything about me by now, you should know that I like automating time-consuming and error-prone tasks. This problem is looking more and more like a nail, so allow me to apply my new favorite hammer: Python!

Here’s the pitch: Write a Python script that iterates through a sequence of compiler configurations, each with its own path and unique cflags, and compiles FFmpeg. For each resulting build, decode a long movie twice, tracking the execution time in milliseconds. Also, for good measure, follow Reimar’s advice and validate that the builds are doing the right thing. To this end, transcode the first 10 seconds of the movie to a separate, unique file for later inspection. After each iteration, write the results to a CSV file for graphing.

And here’s the graph:

icc vs. gcc smackdown, round 3

Look at that! gcc 4.3.2 still isn’t a contender but gcc 4.4-svn is putting up a fight.

Here are the precise details of this run:

  • Movie file is the same as before: 104-minute AVI; ISO MPEG-4 part 2 video (a.k.a. DivX/XviD) at 512×224, 24 fps; 32 kbps, 48 kHz MP3
  • This experiment includes gcc 4.4.0-svn, revision 143046, built on 2009-01-03 (I’m a bit behind)
  • All validations passed
  • Machine is a Core 2 Duo, 2.13 GHz
  • All 8 configurations are compiled with –disable-amd3dnow –disable-amd3dnowext –disable-mmx –disable-mmx2 –disable-sse –disable-ssse3 –disable-yasm
  • icc configuration compiled with –cpu=core2 –parallel
  • gcc 4.3.2 and 4.4.0-svn configurations compiled with -march=core2 -mtune=core2
  • all other gcc versions compiled with no special options

See Also:

What’s in store for round 4? It sure would be nice to get icc 11.0 series working on my machine for once to see if it can do any better. And since I have the benchmark framework, it would be nice to stuff LLVM in there to see how it stacks up. I would also like to see how the various builds perform when decoding H.264/AAC. The problem with that is the tremendous memory leak that slows execution to a crawl during a lengthy transcode. Of course I would be willing to entertain any suggestions you have for compiler options in the next round.

Better yet, perhaps you would like to try out the framework yourself. As is my custom, I like to publish my ad-hoc Python scripts here on my blog or else I might never be able to find them again.

Continue reading