Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


Archives:

Performance Smackdown: The Latest in 64-bit From GCC and Intel

April 26th, 2009 by Multimedia Mike

Since gcc 4.4.0 has been formally released, it’s time to re-run the compiler output benchmarks. Further, I finally sat down and put my mind toward getting the latest Intel C compiler installed and operational. I met with limited success. I haven’t been able to get the 32-bit compiler working. After the tedious rigmarole of getting version 11.0.081 installed, I launched the program without any parameters:

$ /opt/intel/Compiler/11.0/081/bin/ia32/icc
Segmentation fault

Grrrrr… why do I even bother? Fortunately, the intel64 (x86_64) compiler is operational. At the same time I was grabbing the Linux version, I noticed that there is a Mac OS X version, though it is somewhat down-rev at 11.0.059. I still downloaded that and tried it out. I was able to get it to build 32-bit binaries but not 64-bit.

So the upshot, FATE-wise, is that I have put 11.0.081/Linux/x86_64 and 11.0.059/Mac OS X/x86_32 into the system for continuous building and testing. At the time of this writing, they’re not doing so well. Lots of H.264 tests fail. The regressions pass for the most part, though.

But I stubbornly proceeded with the output benchmarks anyway. This is how the compilers are performing, per my usual method (best time out of 2 runs on the same, long, HD file; no hand-crafted ASM optimizations enabled):


64-bit compiler output performance chart, round 2

The gcc versions demonstrate similar performance to the first round of 64-bit tests. As for the icc 64-bit results, well, I don’t think I need to interpret that for you. I will tell you that I first ran it with no special options. Then I ran it with “–cpu=core2” which improved its run time by about 3 seconds. The gcc configurations used no special options.

However, there is a deeper issue. As indicated by the FATE tests, icc is incorrectly decoding H.264 video. Thanks to the 10-second validation files generated during the benchmarks, I am able to see that, what should look like this (from gcc 4.4.0):


64-bit validation file, generated by gcc 4.4.0

turns out like this (icc 11.0.081):


64-bit validation file, generated by icc 11.0.081

This makes me wonder what is so special about the FFmpeg H.264 decoder that icc has so much trouble digesting it. Is the code especially tricky? Or does it have a lot of tight loops that icc sees as opportunities for (mistaken) vectorization?

Another issue that concerns me regarding this latest series of Intel C compilers: I only have an evaluation license for 31 days. I’m not sure what happens after that. Presumably, I don’t get to use the compiler anymore. However, Intel seems to rev their compiler so often that I wonder if each minor update comes with a 31-day evaluation license.

See Also:

Posted in FATE Server | 12 Comments »

12 Responses

  1. Diego “Flameeyes” Pettenò Says:

    Regarding the license, I remember there was an evaluation license for free software developers or something like that, without the 31-days limitation, but I’ll have to dig it up.

  2. Carl Eugen Hoyos Says:

    icc 10.1 64bit and 11.0 32bit pass all tests, afaik. (=I knew of the failures you found, and reported some of them to intel and your wiki, but I don’t know of any problems for the mentioned versions.)
    There is a free unlimited no-commercial-use license for all versions.

  3. Reimar Says:

    If it decodes incorrectly it will most likely also run some error concealment code, which is very slow. Sure that isn’t the reason for the bad benchmark results?

  4. Mans Says:

    If it decodes incorrectly, it’s obviously doing something it shouldn’t, which may well be taking a long time, error concealment or not.

  5. Michael Mol Says:

    Pardon my glib nature, but I do believe the icc output rocks…Its compiled output took a video of a map, and generated a rough video of a beach on a coastline in that map. This is a huge advancement in image processing and AI.

  6. avenger Says:

    That’s right, the water/mountains/sand are clearly recognisable.

  7. mat Says:

    Mike could you add clang/llvm tp your benchmarks ?

    ffmpeg should be ok with the last svn version.
    Sometime ago I sent you a procedure to setup it [1]

    [1]

    after some bug hunting, I manage to make clang/llvm to pass ffmpeg codectest.
    ATM only the C only version works because of some bugs in handling inline asm constraints [1].

    Could you add this Compiler to your “x86_32 / Linux” machine ?

    The compiler build procedure is quite easy[2].
    What I did was to fetch svn source code :
    $ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
    $ cd llvm/tools
    $ svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
    $ cd ..
    $ ./configure –enable-optimized
    $ cd ..

    Then I have a script to update it, rebuild it and build ffmpeg[3]

    Thanks

    Matthieu

    [1]http://llvm.org/bugs/show_bug.cgi?id=3812

    [2]http://clang.llvm.org/get_started.html

    [3]
    cd llvm
    svn up && svn up tools/clang/
    make -j3
    cd –
    export PATH=$PWD/llvm/Release/bin/:$PATH
    cd ffmpeg
    ./configure –cc=clang –enable-gpl –arch=c
    make -j 3&& make codectest

  8. No Says:

    Hmmm. I think you underestimate the icc compiler. I think it tried to make a Google map. To me it looks like it actually tried to create a height map with textures. Kudos to Intel engineers!

    .

  9. Multimedia Mike Says:

    @mat: Okay, I got it set up. Regrettably, FFmpeg doesn’t build with your process (link problems).

  10. sean darcy Says:

    AIUI, the major optimization benefit of gcc 4.4 is “graphite”, which is turned off by default. To turn it on you need switches: -floop-interchange -floop-strip-mine -floop-block

    It’d be interesting to see what effect graphite has.

  11. Artem S. Tashkinov Says:

    Without graphite GCC 4.4 seems to be considerably slower than GCC 4.2.x in most situations.

    I find it abhorrent when a new major GCC release brings slow downs instead of speed ups. And Fedora 11 is already compiled using GCC 4.4.x. It’s a huge mess – using a new compiler before even trying to evaluate it.

    Something terribly wrong is happening at GCC/GNU side. Two major GCC releases (4.3 and 4.4) bring mostly performance degradation …

  12. Multimedia Mike Says:

    @Artem: To be fair, 32-bit gcc 4.4 produces markedly faster code than gcc 4.2 or 4.3.