Last Performance Smackdown For Awhile | Breaking Eggs And Making Omelettes

This is getting arduous. I think this will be my last performance smackdown for awhile. First off, I put the latest in the icc 10.1 series — 10.1.022 — into FATE for both x86_32 and x86_64. It seems to work quite well with the 32-bit version having a little trouble with the regression suite; 64-bit version passes all of our tests.

For this test, I decided to use a much shorter video. The file in question has ~10700 frames of MPEG-4 part 2 video at 704×400, along with MP3 audio. The x86_32 performance trend shapes up precisely as we have seen in previous tests, and with a file that takes 1/10 the time to decode. FFmpeg SVN revision is 18737.

32-bit performance comparison, 2009-05-04

As usual, all handcrafted ASM optimizations are disabled. The x86_32 configurations were built with –march/–cpu equal to core2 where available, else pentium4 where available.

Here is the 64-bit chart. It must be noted that FFmpeg compiled with 11.0.083 did not decode the file correctly.

64-bit performance comparison, 2009-05-04

Update: I finally got the dark horse contender — LLVM — to compile at SVN 70961 for x86_64. Out of 2 runs with this same file, it posts a best time of 33.6 seconds.

The differences look severe, but they are actually within a few seconds of each other. And notice that all 64-bit configurations are demonstrably speedier than all 32-bit configurations.

Somehow, it’s only now as I prepare to publish this entry that I realize something amiss– how did my current gcc-svn build manage to build FFmpeg when FATE can’t do the same? It must be the configure options.

See Also:

8 thoughts on “Last Performance Smackdown For Awhile”

Reimar May 5, 2009 at 1:17 am

Isn’t it a bit silly to make the x86_32 graph 0-based and the x86_64 start at 29? Also 1 second of decode time difference is 3 % overall speed. Of course it should not matter much in most cases where asm code will hide almost all the differences (under that aspect, e.g. MIPS would be interesting, just hard to get a good test environment for that).

Carl Eugen Hoyos May 5, 2009 at 2:44 am

Please add –cpu=core2 to the icc 10.1 32bit configuration line. That fixes the regression test (as do pentium3 and pentium4).

Multimedia Mike Post authorMay 5, 2009 at 6:46 am

@Reimar: That’s just how OpenOffice made the graph. I’m lucky I got it to do that much without crashing.

@Carl: Done, thanks (will be reflected during the next SVN update).

Carl Eugen Hoyos May 5, 2009 at 8:45 am

Please remove ‘–extra-cflags=”core2″‘ from FATE, and add ‘–cpu=core2’

Thank you, Carl Eugen

Multimedia Mike Post authorMay 5, 2009 at 9:05 am

Oops, looks like I messed that one up good. :-) I’ll change that when I get access to the server again in a few hours.

Owen May 13, 2009 at 2:56 pm

Considered adding AMD’s Open64 suite to the test lineup?
http://developer.amd.com/CPU/OPEN64/Pages/default.aspx

Multimedia Mike Post authorMay 13, 2009 at 9:17 pm

I’ve never even heard of Open64 (I wondered if AMD was active in the compiler arena). I’ll check it out, though. Thanks.

Owen May 14, 2009 at 3:23 am

Neither had I until two weeks ago :)

Comments are closed.