icc vs. gcc Smackdown, Round 3 | Breaking Eggs And Making Omelettes

How did I become the benchmark peon? Oh right, I actually dared to put forth some solid benchmarks and called for suggestions for possible improvements to the benchmark methodology. This is what I get.

Doing these benchmarks per all the suggestions I have received is time-consuming and error-prone. But if you know anything about me by now, you should know that I like automating time-consuming and error-prone tasks. This problem is looking more and more like a nail, so allow me to apply my new favorite hammer: Python!

Here’s the pitch: Write a Python script that iterates through a sequence of compiler configurations, each with its own path and unique cflags, and compiles FFmpeg. For each resulting build, decode a long movie twice, tracking the execution time in milliseconds. Also, for good measure, follow Reimar’s advice and validate that the builds are doing the right thing. To this end, transcode the first 10 seconds of the movie to a separate, unique file for later inspection. After each iteration, write the results to a CSV file for graphing.

And here’s the graph:

Look at that! gcc 4.3.2 still isn’t a contender but gcc 4.4-svn is putting up a fight.

Here are the precise details of this run:

Movie file is the same as before: 104-minute AVI; ISO MPEG-4 part 2 video (a.k.a. DivX/XviD) at 512×224, 24 fps; 32 kbps, 48 kHz MP3
This experiment includes gcc 4.4.0-svn, revision 143046, built on 2009-01-03 (I’m a bit behind)
All validations passed
Machine is a Core 2 Duo, 2.13 GHz
All 8 configurations are compiled with –disable-amd3dnow –disable-amd3dnowext –disable-mmx –disable-mmx2 –disable-sse –disable-ssse3 –disable-yasm
icc configuration compiled with –cpu=core2 –parallel
gcc 4.3.2 and 4.4.0-svn configurations compiled with -march=core2 -mtune=core2
all other gcc versions compiled with no special options

See Also:

What’s in store for round 4? It sure would be nice to get icc 11.0 series working on my machine for once to see if it can do any better. And since I have the benchmark framework, it would be nice to stuff LLVM in there to see how it stacks up. I would also like to see how the various builds perform when decoding H.264/AAC. The problem with that is the tremendous memory leak that slows execution to a crawl during a lengthy transcode. Of course I would be willing to entertain any suggestions you have for compiler options in the next round.

Better yet, perhaps you would like to try out the framework yourself. As is my custom, I like to publish my ad-hoc Python scripts here on my blog or else I might never be able to find them again.

To configure this, modify SOURCE_DIR and BUILD_DIR (the latter is very important since the script will ‘rm -rf’ it before the build, just like in FATE). Find the 2 places in the code that state “/home/fate/movie.avi” and replace them with your favorite long media file. Then, monkey with the CONFIGURATIONS dictionary and the sub-dictionaries within (modifying the “cflags” and “compiler_path” keys). After the thing runs (BTW, you might want to redirect stdout to /dev/null), look for output.csv in the current working directory. Go ahead and modify the ‘make -j3’ part to an appropriate number of CPU threads for your machine. Oh, and run at your own risk, particularly due to the automatic ‘rm -rf’.

12 thoughts on “icc vs. gcc Smackdown, Round 3”

Robert Swain March 9, 2009 at 7:51 am

What’s this tremendous memory leak for H.264 decoding? Has it been reported…?

Multimedia Mike Post authorMarch 9, 2009 at 8:10 am

It’s the leak discussed in this ffmpeg-devel thread: http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2009-February/061095.html

Tomer Gabel March 9, 2009 at 8:26 am

Cool, good to see GCC finally catching up.
One thing did cross my mind though: the GCC compiler flags that disable 3DNow/MMX etc.; are they specific to the FFmpeg codebase, or do they actually force GCC not to vectorize code? I expect the former, but if it’s the latter I don’t see why.

Diego "Flameeyes" PettenÃ² March 9, 2009 at 8:44 am

Tomer if you refer to –disable-mmx and similar, they disable the handwritten MMX asm code in FFmpeg.

Mike you should at least try a -march=pentium4 or similar for the older versions of GCC, otherwise you’re biasing against anything <4.3…

Reimar March 9, 2009 at 9:01 am

@Diego: -march=pentium-m (if available) or -march=pentium3 should work better for core2 than -march=pentium4
@Mike: I tried to reproduce your memory leak problem, but I obviously don’t have the same files as you. So I tried
ffmpeg -i Here\ Be\ Dragons.m4v -f framecrc – > /dev/null
and while the memory usage varied heavily between 12 and 25 MB, it did not seem to increase in the long term, and fps even slightly increased…
The video is the only suitable one I could find, it’s 40 minutes H.264/AAC, it is available here: http://herebedragonsmovie.com/
The md5sum of mine is 644c0c5ba6073163b32a985742562bae
(note that I tried on x86_64).

Multimedia Mike Post authorMarch 9, 2009 at 10:34 am

Thanks for the link, Reimar. I will try out that movie in the next performance test.

astrange March 9, 2009 at 11:16 am

-mtune=pentium-m, not -march.

If you really, really love benchmarking, try backing down gcc svn until you find the performance decrease from 4.1->4.2. Depending on the part of the compiler, it might still apply to svn, but hidden by other improvements. Binary searching this is possible, but I’d expect compiling gcc to take just about forever…

Multimedia Mike Post authorMarch 9, 2009 at 12:01 pm

@astrange: Not too far-fetched; I already have something similar in mind for finding when all of those H.264 tests started breaking with gcc-svn on PPC.

bkero March 12, 2009 at 9:02 am

There’s a tool that already exists to do this, but at a smaller scale called ACOVEA. It stands for Analysis of Compiler Options via Evolutionary Algorithm, and it measures intra-compiler flag sets, but is completely configurable to use any set of buildable and executable source.

Wyatt March 26, 2009 at 1:48 am

If you continue playing with LLVM, maybe see if Clang will compile it (I somehow feel that unlikely, as of yet). Also, what linkers are you using? llvm-gcc defaults to the GNU ld which I recall is fairly unintelligent. I think Intel’s linker does some optimisation, but I’ve recently been trying to coax llvm-ld to work as a replacement instead: seeing how the linkers affect the results might be neat.

Isaac Gouy April 24, 2009 at 8:02 am

Multimedia Mike > “Write a Python script that iterates through a sequence of compiler configurations, each with its own path and unique cflags, and compiles FFmpeg.”

Just use an existing Python script!

http://shootout.alioth.debian.org/u32q/faq.php#measurementscripts

Steve June 20, 2009 at 3:27 am

> -march=core2 -mtune=core2

That 2nd option is redundant. Per the GCC v4.3.2 man page:

`-march=CPU-TYPE’
Generate instructions for the machine type CPU-TYPE. The choices for CPU-TYPE are the same as for `-mtune’. Moreover, specifying `-march=CPU-TYPE’ implies `-mtune=CPU-TYPE’.

Comments are closed.