Author Archives: Multimedia Mike

Systematic Benchmarking Adjunct to FATE

Pursuant to my rant on the futility of comparing, performance-wise, the output of various compilers, I wholly acknowledge the utility of systematically benchmarking FFmpeg. FATE is not an appropriate mechanism for doing so, at least not in its normal mode of operation. The “normal mode” would have each of every configuration (60 or so) running certain extended test specs during every cycle. Quite a waste.

Hypothesis: By tracking the performance of a single x86_64 configuration, we should be able to catch performance regressions in FFmpeg.

Proposed methodology: Create a new script that watches for SVN commits. For each and every commit (no skipping), check out the code, build it, and run a series of longer tests. Log the results and move on to the next revision.

What compiler to use? I’m thinking about using gcc 4.2.4 for this. In my (now abandoned) controlled benchmarks, it was the worst performer by a notable margin. I’m thinking that the low performance might help to accentuate performance regressions. Is this a plausible theory? 2 years of testing via FATE haven’t revealed any other major problems with this version.

What kind of samples to test? Thankfully, Big Buck Bunny is available in 4 common formats:

MP4/MPEG-4 part 2 video/AC3 audio
MP4/H.264 video/AAC audio
Ogg/Theora video/Vorbis audio
AVI/MS MPEG-4 video/MP3 audio

I have the 1080p versions of all those files, though I’m not sure if it’s necessary to decode all 10 minutes of each. It depends on what kind of hardware I select to run this on.

Further, I may wish to rip an entire audio CD as a single track, encode it with MP3, Vorbis, AAC, WMA, FLAC, and ALAC, and decode each of those.

What other common formats would be useful to track? Note that I only wish to benchmark decoding. My reasoning for this is that decoding should, on the whole, only ever get faster, never slower. Encoding might justifiably get slower as algorithmic trade-offs are made.

I’m torn on the matter of whether to validate the decoding output during the benchmarking test. The case against validation says that computing framecrc’s is going to impact the overall benchmarking process; further, validation is redundant since that’s FATE’s main job. The case for validation says that since this will always be run on the same configuration, there is no need to worry about off-by-1 rounding issues; further, if a validation fails, that data point can be scrapped (which will also happen if a build fails) and will not count towards the overall trend. An errant build could throw off the performance data. Back on the ‘against’ side, that’s exactly what statistical methods like weighted moving averages are supposed to help smooth out.

I’m hoping that graphing this idea for all to see will be made trivial thanks do Google’s Visualization API.

The script would run continuously, waiting for new SVN commits. When it’s not busy with new code, it would work backwards through FFmpeg’s history to backfill performance data.

So, does this whole idea hold water?

If I really want to run this on every single commit, I’m going to have to do a little analysis to determine a reasonable average number of FFmpeg SVN commits per day over the past year and perhaps what the rate of change is (I’m almost certain the rate of commits has been increasing). If anyone would like to take on that task, that would be a useful exercise (‘svn log’, some text manipulation tools, and a spreadsheet should do the trick; you could even put it in a Google Spreadsheet and post a comment with a link to the published document).

Benchmark Bitch No More

You know what? I’m tired of being the benchmark bitch.

I have come to realize that benchmarks are exclusively the domain of people who have a vested interest in the results. I’m not financially or emotionally invested in the results of these compiler output performance comparisons; it was purely a matter of academic curiosity. As such, it’s hard to keep the proper motivation. No matter how carefully I set up a test and regardless of how many variables I control, the feedback is always the same: “Try tweaking this parameter! I’m sure that will make my favorite compiler clean up over all the other options!” “Your statistical methods are all wrong!” “Your graphs are fraudulently misleading.” Okay, I wholeheartedly agree with that last one, but blame OpenOffice for creating completely inept graphs by default.

Look, people, compilers can’t work magic. Deal with it. “–tweak-optim-parm=62” isn’t going to net the 10% performance increase needed to make the ethically pure, free software compiler leapfrop over the evil, proprietary compiler (and don’t talk to me about profile-guided optimization; I’m pretty sure that’s an elaborate hoax and even if it’s not, the proprietary compilers also advertise the same feature). Don’t put your faith in the compiler to make your code suck less (don’t I know). Investigate some actual computer science instead. It’s especially foolhardy to count on compiler optimizations in open source software. Not necessarily because of gcc’s quality (as you know, I think gcc does remarkably well considering its charter), but because there are so many versions of compilers that are expected to compile Linux and open source software in general. The more pressing problem (addressed by FATE) is making sure that a key piece of free software continues to compile and function correctly on a spectacular diversity of build and computing environments.

If anyone else has a particular interest in controlled FFmpeg benchmarks, you may wish to start with my automated Python script in this blog post. It’s the only thing that kept me sane when running these benchmarks up to this point.

I should clarify that I am still interested in reorganizing FATE so that it will help us to systematically identify performance regressions in critical parts of the code. The performance comparison I care most about is whether today’s FFmpeg SVN copy is slower than yesterday’s SVN copy.

See Also:

The final entry in the compiler smackdown series

Update: I have to state that I’m not especially hurt by any criticism of my methods (though the post may have made it seem that way). Mostly, I wanted to force myself to quit wasting time on these progressively more elaborate and time-consuming benchmarks when they’re really not terribly useful in the grand scheme of things. I found myself brainstorming some rather involved profiling projects and I had to smack myself. I have far more practical things I really should be using my free time for.

Compiler Smackdown 2010-1, 64-bit

It’s time to do a new compiler smackdown for a few reasons:

It has been quite awhile since the last one.
I received a request to know how icc 11.1 measured up.
I wanted an excuse to post a picture of the GCC cheerleaders.

GCC Cheerleaders (from "Community" TV show)

For this round, I tested x86_64 on my Core 2 Duo 2.0 GHz. I compiled FFmpeg with 6 versions of gcc (including gcc 4.5, svn 156187), 3 versions of icc, and the latest (svn 94292) of LLVM. Then I used the resulting FFmpeg binaries to decode both a Theora/Vorbis video and an H.264/AAC video.

Ogg/Theora/Vorbis, 1920×1080 video, 48000 Hz stereo audio, nearly 10 minutes:

MP4/H.264/AAC: 1280×720 video, 48000 Hz stereo audio, 4.5 minutes:

Wow! Look at LLVM go. I take back all, or at least some, of the smack I’ve typed about it in previous posts. Out of the free compiler solutions, LLVM makes my Theora code suck the least.

Other relevant data about this round:

FFmpeg SVN 21390 used for this test
Flags: ‘–disable-debug –disable-amd3dnow –disable-amd3dnowext –disable-mmx –disable-mmx2 –disable-sse –disable-ssse3 –disable-yasm’ used for all configurations; also used ‘–disable-asm’ which might make a lot of those obsolete now.
gcc 4.3-4.5 used “-march=core2 -mtune=core2”; icc versions used “–cpu=core2 –parallel”

See Also:

Optimizing Google Spreadsheets

It happens on occasion that everyone gushes about a technology that leaves me utterly bewildered– not because I don’t understand what the tech does but because I can’t get it to work at all. It happened the first time I tried out xine and the first time I tried to use x264 for anything serious. But I eventually solved those situations.

Here’s a new problem that has been bugging me: Google Docs, specifically their spreadsheets. They’re so bold as to launch a daily-updated ad campaign of billboards declaring that it’s perfectly plausible to switch an entire office from Microsoft Office software to Google software. For the last few months, I have been using Google Spreadsheets to track a fairly meager amount of data. It hasn’t been going that swell. So whenever I see a fawning article about how Google is totally going to dominate Microsoft using these online apps, I’m left scratching my head and wondering if I’m missing something.

This is my tracking spreadsheet for all my video games. You’re likely to notice that it’s quite taxing on your browser just to load the DOS/Windows sheet which only consists of about 600 rows. This data started life on an OpenOffice Calc spreadsheet before eventually being imported to Google Docs. I should state that I have never been very proficient with spreadsheets. Obviously, all I’m doing here is organizing tables of information with a little coloring. So, no complicated (or even simple) formulas. Maybe it’s all the coloring that throws the system for a loop. Or something about its OOo origins.

For a current test, I downloaded the very latest versions of Firefox (3.6, as of today), Safari, and Chrome on my Atom-based nettop running Windows XP. This is a basic visualization describing how each handles opening my games spreadsheet in Google Docs:

Firefox

Safari

Chrome

Firefox takes awhile to load the spreadsheet (up to 30s) and pegs one CPU the entire time. But once the spreadsheet loads, the browser chills until there is some more interaction. Safari seems to load a bit quicker but never takes the load off that 1 CPU. The spreadsheet is still a little usable.

Most surprising, however, is that Google’s Chrome, for all intents and purposes, completely falls over on the games spreadsheet. This is using the very latest version on Windows, which I assume is the version that receives the lion’s share of Chrome’s dev resources. Chrome eventually loads the spreadsheet (older versions had trouble even doing that), but can’t seem to scroll through it.

Given all the hype — both within and outside of Google — that I see surrounding Google Docs, I can’t help but wonder if I’m doing something wrong. Am I throwing too much data its way? I’m anything but a typical spreadsheet user so it could very well be that most spreadsheets contain well fewer than 600 lines of data. Somehow, I think it has something to do with having imported the data for an OpenOffice Calc spreadsheet. My data point for this is that another Google spreadsheet that I have maintained from scratch but has grown to a similar size has much less trouble. Further, I generated a 13,000-line CSV file, imported that, and see little difficulty (relative to the games spreadsheet) navigating around. Pro tip: Don’t try to sort 13,000 rows of data in a Google spreadsheet:

Google Spreadsheet tells me where I can stick my data

Perhaps it’s an unreasonable request. I do know that OpenOffice is able to process the same request in about 2 seconds.

But I digress. I was wondering how Google could possibly claim this is ready for prime-time. Then I realized that Excel spreadsheets are more likely to be thrown at the system. I decided to try exporting the spreadsheet as an Excel spreadsheet (loading the Google spreadsheet in Firefox since that’s the most responsive), then uploading the new Excel spreadsheet.

Success! Firefox browses the spreadsheet much faster and Google Chrome is able to navigate it at all. It’s still a bit sluggish in Chrome but it’s at least a little usable. You know, pursuant to today’s Firefox 3.6 release, I have been reading comments that Chrome still beats Firefox in some artificial JavaScript benchmarks. After this episode, I think this would make a much more useful, real-world JS benchmark.

Through it all, I really want to be able to make use of Google Docs. So far, it has proven very useful as a means of coordination between myself and a bunch of other video game historians. But I was befuddled to no end that I couldn’t get my favorite spreadsheet to work in Google’s own browser.

For reference, here are the old and new spreadsheets for comparison in your browser:

Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering

Author Archives: Multimedia Mike

Systematic Benchmarking Adjunct to FATE

Benchmark Bitch No More

Compiler Smackdown 2010-1, 64-bit

Optimizing Google Spreadsheets