Monthly Archives: April 2009

Performance Smackdown: PowerPC

Someone asked me for performance numbers for the PowerPC, i.e., how efficiently can a PowerPC CPU decode certain types of multimedia via FFmpeg. So I ran my compiler benchmark script on the 5 compiler configurations currently in FATE. I did 2 runs, one with and one without AltiVec optimizations. I used the 512×224 MPEG-4 part 2 video with MP3 audio (104 minutes, ~144,000 frames). These tests were run on a 1.25 GHz PowerPC G4 (Mac Mini running Linux). The FFmpeg source code was at SVN revision 18711.

PowerPC performance comparison

Interesting stuff: The performance trends do not parallel the chaos we have seen with x86_32 and x86_64. Instead, we see continuous improvement.

Suggestions for improvement welcome, though there don’t seem to be a lot of tunable parameters for PowerPC in gcc.

See Also:

Progress On Those Crashers

Last December, I set about on the task of downloading and testing a huge number of files that were known, at one point, the crash FFmpeg. I devised a system for automatically running the files and determining whether they still crash in FFmpeg. Quite a few of them did. Then, I sort of let the project sit.

I got around to running a new round of tests with the utility I created in December and compared the results with those of 4 months ago. Today’s test was conducted with FFmpeg SVN-r18707 built with “gcc: 4.0.1 (Apple Inc. build 5484)”, 32-bit version, and run on Mac OS X.

Result December 8, 2008 April 27, 2009
Success 2148 2781
FFmpeg error 1333 1389
SIGFPE 376 1
SIGKILL (timeouts) 16 17
SIGSEGV 529 123

Great progress, especially on those floating point exceptions. I’m pretty sure nearly all of those were attributable to one or a few problems in the Real demuxer that have since been addressed. The only remaining problem in the FPE category is an AVI file.

The timeout category represents the number of files that ran longer than a minute (need to keep the process moving). The “FFmpeg error” category (return code 1) is on the rise. I surmise that’s because FFmpeg is getting better at rejecting errant files vs. crashing on them. I should really formulate a query that reveals which files’ status changed, and how, between runs.

A big reason I sat on this project for so long is that I didn’t know how to proceed. Should I start testing the problem files manually, collect stack traces, and flood the FFmpeg issue tracker with hundreds of new reports? I don’t want to deal with that kind of manual labor and I don’t think my co-devs want to deal with that volume of (possibly redundant) bug traffic.

Since December, I have developed another idea: Automatically running the problem files through gdb and looking for patterns. For example, I manually checked those 6 crashers that threw SIGABRT (the same 6 files from each run, BTW, and all ASF files). They all seem to fail as follows:

Program received signal SIGABRT, Aborted.
0x96dbbe42 in __kill ()
(gdb) bt
#0  0x96dbbe42 in __kill ()
#1  0x96dbbe34 in kill$UNIX2003 ()
#2  0x96e2e23a in raise ()
#3  0x96e3a679 in abort ()
#4  0x96e2f3db in __assert_rtn ()
#5  0x00026529 in ff_asf_parse_packet (s=0x1002600, pb=0xa00200, 
pkt=0xbfffe954) at /Users/melanson/ffmpeg/ffmpeg-main/libavformat/asfdec.c:709

It would be nice to create a script that identifies that all 6 of those files suffer from the same, or similar problem and group those files together in a report. I am not sure if gdb offers non-interactive options that are conducive to this situation. I know it has a -batch mode, but I’m not really sure what that’s for. If need be, I can always create a Python script that opens gdb in interactive mode and has a stdin/stdout conversation with it.

See Also:

Performance Smackdown: The Latest in 64-bit From GCC and Intel

Since gcc 4.4.0 has been formally released, it’s time to re-run the compiler output benchmarks. Further, I finally sat down and put my mind toward getting the latest Intel C compiler installed and operational. I met with limited success. I haven’t been able to get the 32-bit compiler working. After the tedious rigmarole of getting version 11.0.081 installed, I launched the program without any parameters:

$ /opt/intel/Compiler/11.0/081/bin/ia32/icc
Segmentation fault

Grrrrr… why do I even bother? Fortunately, the intel64 (x86_64) compiler is operational. At the same time I was grabbing the Linux version, I noticed that there is a Mac OS X version, though it is somewhat down-rev at 11.0.059. I still downloaded that and tried it out. I was able to get it to build 32-bit binaries but not 64-bit.

So the upshot, FATE-wise, is that I have put 11.0.081/Linux/x86_64 and 11.0.059/Mac OS X/x86_32 into the system for continuous building and testing. At the time of this writing, they’re not doing so well. Lots of H.264 tests fail. The regressions pass for the most part, though.

But I stubbornly proceeded with the output benchmarks anyway. This is how the compilers are performing, per my usual method (best time out of 2 runs on the same, long, HD file; no hand-crafted ASM optimizations enabled):

64-bit compiler output performance chart, round 2

The gcc versions demonstrate similar performance to the first round of 64-bit tests. As for the icc 64-bit results, well, I don’t think I need to interpret that for you. I will tell you that I first ran it with no special options. Then I ran it with “–cpu=core2” which improved its run time by about 3 seconds. The gcc configurations used no special options.

However, there is a deeper issue. As indicated by the FATE tests, icc is incorrectly decoding H.264 video. Thanks to the 10-second validation files generated during the benchmarks, I am able to see that, what should look like this (from gcc 4.4.0):

64-bit validation file, generated by gcc 4.4.0

turns out like this (icc 11.0.081):

64-bit validation file, generated by icc 11.0.081

This makes me wonder what is so special about the FFmpeg H.264 decoder that icc has so much trouble digesting it. Is the code especially tricky? Or does it have a lot of tight loops that icc sees as opportunities for (mistaken) vectorization?

Another issue that concerns me regarding this latest series of Intel C compilers: I only have an evaluation license for 31 days. I’m not sure what happens after that. Presumably, I don’t get to use the compiler anymore. However, Intel seems to rev their compiler so often that I wonder if each minor update comes with a 31-day evaluation license.

See Also:

GCC 4.4.0

Someone notified me today that gcc 4.4.0 has been officially released. Then I went numb. Maybe it was the fact that I would have to compile it several times over for various configurations for FATE. Maybe it’s the fact that this portends a new round of compiler output benchmarks. But I think it might be due to the fact that I was negligent in investigating the fact that experimental versions of gcc 4.4.0 from SVN compiled on PowerPC have been producing problematic FFmpeg results for several months. The newly released 4.4.0 has the same problems.

I guess there’s always 4.4.1. Either that, or formal deprecation of 32-bit PowerPC.