Benchmark Bitch No More

You know what? I’m tired of being the benchmark bitch.

I have come to realize that benchmarks are exclusively the domain of people who have a vested interest in the results. I’m not financially or emotionally invested in the results of these compiler output performance comparisons; it was purely a matter of academic curiosity. As such, it’s hard to keep the proper motivation. No matter how carefully I set up a test and regardless of how many variables I control, the feedback is always the same: “Try tweaking this parameter! I’m sure that will make my favorite compiler clean up over all the other options!” “Your statistical methods are all wrong!” “Your graphs are fraudulently misleading.” Okay, I wholeheartedly agree with that last one, but blame OpenOffice for creating completely inept graphs by default.

Look, people, compilers can’t work magic. Deal with it. “–tweak-optim-parm=62” isn’t going to net the 10% performance increase needed to make the ethically pure, free software compiler leapfrop over the evil, proprietary compiler (and don’t talk to me about profile-guided optimization; I’m pretty sure that’s an elaborate hoax and even if it’s not, the proprietary compilers also advertise the same feature). Don’t put your faith in the compiler to make your code suck less (don’t I know). Investigate some actual computer science instead. It’s especially foolhardy to count on compiler optimizations in open source software. Not necessarily because of gcc’s quality (as you know, I think gcc does remarkably well considering its charter), but because there are so many versions of compilers that are expected to compile Linux and open source software in general. The more pressing problem (addressed by FATE) is making sure that a key piece of free software continues to compile and function correctly on a spectacular diversity of build and computing environments.

If anyone else has a particular interest in controlled FFmpeg benchmarks, you may wish to start with my automated Python script in this blog post. It’s the only thing that kept me sane when running these benchmarks up to this point.

I should clarify that I am still interested in reorganizing FATE so that it will help us to systematically identify performance regressions in critical parts of the code. The performance comparison I care most about is whether today’s FFmpeg SVN copy is slower than yesterday’s SVN copy.

See Also:

Update: I have to state that I’m not especially hurt by any criticism of my methods (though the post may have made it seem that way). Mostly, I wanted to force myself to quit wasting time on these progressively more elaborate and time-consuming benchmarks when they’re really not terribly useful in the grand scheme of things. I found myself brainstorming some rather involved profiling projects and I had to smack myself. I have far more practical things I really should be using my free time for.

7 thoughts on “Benchmark Bitch No More

  1. Short Circuit

    The easiest solution that still leaves room for productivity is to take what grains of useful info were there, and discard the personally-targeted slights and wonton disrespect.

    That’s the easiest solution, and I still haven’t gotten it down. I can name at least four personal slights in the last three years I’m still having a hard time letting go for, and one of them still has a detrimental effect on how I run my site.

    People are assholes. Sucks, but it’s true. I hope you (at some point) do more of these benchmarks; They’re always interesting reads. (As is everything else you write, or I wouldn’t be following the RSS feed…)

  2. triton

    Some people are only able to critize what others do… Don’t mind those people. I’m sure there are much more who find these benchmarks really interesting and useful, me being one of them. I wish you continue to make some of these benchmarks available. Thanks.

  3. Sam

    Really sorry to hear this Mike, I really hope my comments are not part of your anguish. I think this is a great resource with it’s wide range of compilers and versions and I would hate to see it go away. As for those talking about compiler switches etc, as far as I understand this is meant to be a test with -march=x and -O3, which is pretty much the closest to a fair comparison that can be made in while still remaining practical. Adding things like profile guided optimization (it’s not a hoax though, trust me :) ), link time optimization and other ‘exotic’ optimizations would make these benchmarks pointless since not all compilers support them, or they support them but not on all platforms etc.

    Finally, the fact that you do not have a financial or emotional interest in the results is what makes you perfect for doing these (it’s your cross to bear ;) ). I really hope you continue as there are so few compiler benchmarking resources around that provides a good range of data.

  4. Art Clarke

    For what it’s worth Mike, I really appreciate the work you did on benchmarking and think its guidance was very useful to the team. And I totally understand your perspective given the feedback.

    I believe benchmarks are just a tool to help direct where you spend engineering resources, and that spending time tweaking compiler settings to optimize a benchmark for compiler (A) vs. compiler (B) is almost always better spent on looking at code-clean-up or algorithmic ways where speed-ups can be obtained.

    But thanks for the work so far anyway (oh yeah, and for FATE too).

    – Art

  5. Andrew Brampton

    I just wanted to say I have enjoyed reading the benchmarks posts as well as everything else you post on your blog. So not everyone is a critic, and I hope you continue to write interesting articles in the future.

    thanks

  6. Hellfred

    I can just repeat what was already written before. I always enjoy reading your posts, including the compiler smackdowns. Thanks for the hard work done here and for tuning the theora decoder.

Comments are closed.