How did I become the benchmark peon? Oh right, I actually dared to put forth some solid benchmarks and called for suggestions for possible improvements to the benchmark methodology. This is what I get.
Doing these benchmarks per all the suggestions I have received is time-consuming and error-prone. But if you know anything about me by now, you should know that I like automating time-consuming and error-prone tasks. This problem is looking more and more like a nail, so allow me to apply my new favorite hammer: Python!
Here’s the pitch: Write a Python script that iterates through a sequence of compiler configurations, each with its own path and unique cflags, and compiles FFmpeg. For each resulting build, decode a long movie twice, tracking the execution time in milliseconds. Also, for good measure, follow Reimar’s advice and validate that the builds are doing the right thing. To this end, transcode the first 10 seconds of the movie to a separate, unique file for later inspection. After each iteration, write the results to a CSV file for graphing.
And here’s the graph:
Look at that! gcc 4.3.2 still isn’t a contender but gcc 4.4-svn is putting up a fight.
Here are the precise details of this run:
- Movie file is the same as before: 104-minute AVI; ISO MPEG-4 part 2 video (a.k.a. DivX/XviD) at 512×224, 24 fps; 32 kbps, 48 kHz MP3
- This experiment includes gcc 4.4.0-svn, revision 143046, built on 2009-01-03 (I’m a bit behind)
- All validations passed
- Machine is a Core 2 Duo, 2.13 GHz
- All 8 configurations are compiled with –disable-amd3dnow –disable-amd3dnowext –disable-mmx –disable-mmx2 –disable-sse –disable-ssse3 –disable-yasm
- icc configuration compiled with –cpu=core2 –parallel
- gcc 4.3.2 and 4.4.0-svn configurations compiled with -march=core2 -mtune=core2
- all other gcc versions compiled with no special options
What’s in store for round 4? It sure would be nice to get icc 11.0 series working on my machine for once to see if it can do any better. And since I have the benchmark framework, it would be nice to stuff LLVM in there to see how it stacks up. I would also like to see how the various builds perform when decoding H.264/AAC. The problem with that is the tremendous memory leak that slows execution to a crawl during a lengthy transcode. Of course I would be willing to entertain any suggestions you have for compiler options in the next round.
Better yet, perhaps you would like to try out the framework yourself. As is my custom, I like to publish my ad-hoc Python scripts here on my blog or else I might never be able to find them again.
To configure this, modify SOURCE_DIR and BUILD_DIR (the latter is very important since the script will ‘rm -rf’ it before the build, just like in FATE). Find the 2 places in the code that state “/home/fate/movie.avi” and replace them with your favorite long media file. Then, monkey with the CONFIGURATIONS dictionary and the sub-dictionaries within (modifying the “cflags” and “compiler_path” keys). After the thing runs (BTW, you might want to redirect stdout to /dev/null), look for output.csv in the current working directory. Go ahead and modify the ‘make -j3’ part to an appropriate number of CPU threads for your machine. Oh, and run at your own risk, particularly due to the automatic ‘rm -rf’.