How did I become the benchmark peon? Oh right, I actually dared to put forth some solid benchmarks and called for suggestions for possible improvements to the benchmark methodology. This is what I get.
Doing these benchmarks per all the suggestions I have received is time-consuming and error-prone. But if you know anything about me by now, you should know that I like automating time-consuming and error-prone tasks. This problem is looking more and more like a nail, so allow me to apply my new favorite hammer: Python!
Here’s the pitch: Write a Python script that iterates through a sequence of compiler configurations, each with its own path and unique cflags, and compiles FFmpeg. For each resulting build, decode a long movie twice, tracking the execution time in milliseconds. Also, for good measure, follow Reimar’s advice and validate that the builds are doing the right thing. To this end, transcode the first 10 seconds of the movie to a separate, unique file for later inspection. After each iteration, write the results to a CSV file for graphing.
And here’s the graph:
Look at that! gcc 4.3.2 still isn’t a contender but gcc 4.4-svn is putting up a fight.
Here are the precise details of this run:
- Movie file is the same as before: 104-minute AVI; ISO MPEG-4 part 2 video (a.k.a. DivX/XviD) at 512×224, 24 fps; 32 kbps, 48 kHz MP3
- This experiment includes gcc 4.4.0-svn, revision 143046, built on 2009-01-03 (I’m a bit behind)
- All validations passed
- Machine is a Core 2 Duo, 2.13 GHz
- All 8 configurations are compiled with –disable-amd3dnow –disable-amd3dnowext –disable-mmx –disable-mmx2 –disable-sse –disable-ssse3 –disable-yasm
- icc configuration compiled with –cpu=core2 –parallel
- gcc 4.3.2 and 4.4.0-svn configurations compiled with -march=core2 -mtune=core2
- all other gcc versions compiled with no special options
See Also:
What’s in store for round 4? It sure would be nice to get icc 11.0 series working on my machine for once to see if it can do any better. And since I have the benchmark framework, it would be nice to stuff LLVM in there to see how it stacks up. I would also like to see how the various builds perform when decoding H.264/AAC. The problem with that is the tremendous memory leak that slows execution to a crawl during a lengthy transcode. Of course I would be willing to entertain any suggestions you have for compiler options in the next round.
Better yet, perhaps you would like to try out the framework yourself. As is my custom, I like to publish my ad-hoc Python scripts here on my blog or else I might never be able to find them again.
To configure this, modify SOURCE_DIR and BUILD_DIR (the latter is very important since the script will ‘rm -rf’ it before the build, just like in FATE). Find the 2 places in the code that state “/home/fate/movie.avi” and replace them with your favorite long media file. Then, monkey with the CONFIGURATIONS dictionary and the sub-dictionaries within (modifying the “cflags” and “compiler_path” keys). After the thing runs (BTW, you might want to redirect stdout to /dev/null), look for output.csv in the current working directory. Go ahead and modify the ‘make -j3’ part to an appropriate number of CPU threads for your machine. Oh, and run at your own risk, particularly due to the automatic ‘rm -rf’.
What’s this tremendous memory leak for H.264 decoding? Has it been reported…?
It’s the leak discussed in this ffmpeg-devel thread: http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2009-February/061095.html
Cool, good to see GCC finally catching up.
One thing did cross my mind though: the GCC compiler flags that disable 3DNow/MMX etc.; are they specific to the FFmpeg codebase, or do they actually force GCC not to vectorize code? I expect the former, but if it’s the latter I don’t see why.
Tomer if you refer to –disable-mmx and similar, they disable the handwritten MMX asm code in FFmpeg.
Mike you should at least try a -march=pentium4 or similar for the older versions of GCC, otherwise you’re biasing against anything <4.3…
@Diego: -march=pentium-m (if available) or -march=pentium3 should work better for core2 than -march=pentium4
@Mike: I tried to reproduce your memory leak problem, but I obviously don’t have the same files as you. So I tried
ffmpeg -i Here\ Be\ Dragons.m4v -f framecrc – > /dev/null
and while the memory usage varied heavily between 12 and 25 MB, it did not seem to increase in the long term, and fps even slightly increased…
The video is the only suitable one I could find, it’s 40 minutes H.264/AAC, it is available here: http://herebedragonsmovie.com/
The md5sum of mine is 644c0c5ba6073163b32a985742562bae
(note that I tried on x86_64).
Thanks for the link, Reimar. I will try out that movie in the next performance test.
-mtune=pentium-m, not -march.
If you really, really love benchmarking, try backing down gcc svn until you find the performance decrease from 4.1->4.2. Depending on the part of the compiler, it might still apply to svn, but hidden by other improvements. Binary searching this is possible, but I’d expect compiling gcc to take just about forever…
@astrange: Not too far-fetched; I already have something similar in mind for finding when all of those H.264 tests started breaking with gcc-svn on PPC.
There’s a tool that already exists to do this, but at a smaller scale called ACOVEA. It stands for Analysis of Compiler Options via Evolutionary Algorithm, and it measures intra-compiler flag sets, but is completely configurable to use any set of buildable and executable source.
If you continue playing with LLVM, maybe see if Clang will compile it (I somehow feel that unlikely, as of yet). Also, what linkers are you using? llvm-gcc defaults to the GNU ld which I recall is fairly unintelligent. I think Intel’s linker does some optimisation, but I’ve recently been trying to coax llvm-ld to work as a replacement instead: seeing how the linkers affect the results might be neat.
Multimedia Mike > “Write a Python script that iterates through a sequence of compiler configurations, each with its own path and unique cflags, and compiles FFmpeg.”
Just use an existing Python script!
http://shootout.alioth.debian.org/u32q/faq.php#measurementscripts
> -march=core2 -mtune=core2
That 2nd option is redundant. Per the GCC v4.3.2 man page:
`-march=CPU-TYPE’
Generate instructions for the machine type CPU-TYPE. The choices for CPU-TYPE are the same as for `-mtune’. Moreover, specifying `-march=CPU-TYPE’ implies `-mtune=CPU-TYPE’.