Weighted Moving Averages | Breaking Eggs And Making Omelettes

Everyone would like to see FFmpeg performance graphs based on data collected from FATE. One day, I got down to analyzing the millisecond runtime data for one configuration regarding one of the longer tests in the suite. The numbers were all over the map and directly graphing them would prove confusing and meaningless.

Then I had an idea which draws on my limited experience with digital signal processing as it pertains to multimedia. What about plotting points that represent the average of point n along with, say, points (n-15)..(n+15)? Then I refined the idea a bit– how about multiplying each point by a factor such that point n gets the most consideration while points further away from n receive progressively less consideration?

And then I quickly covered up all evidence of the brainstorm for fear of catching a beatdown from a tough gang of statisticians for devising the most idiotic idea in the history of statistics. Imagine my surprise when I was recently reading up on a completely different topic and found the exact same techniques described in practical application. It turns out they even have formal names: Moving averages, in which various forms of weighted moving averages comprise a sub-category.

This is not the first time I have re-invented something that I would later learn is called a “wheel” (the first time I can recall was when I independently figured out a bubble sort algorithm). I’m sure a spreadsheet program can be coerced to massage the data in this manner, but I decided to go the Python route; I have to use Python to effectively extract the data from MySQL into a usable form anyway.

I created a Python script that takes a list of FATE configuration IDs and a test spec number. For each configuration ID, gather the history of millisecond run times for the specified test. Run a weighted moving average using the current run time and the 9 run times prior. The current run time is multiplied by 10, (n-1) is multiplied by 9, down through (n-9) which is multiplied by 1. Divide the average by 55 (10+9+…+1). For a more elegant mathematical explanation, see the Wikipedia entry. This script allows me to dump all the performance data for a series of configurations like, for example, all of the PowerPC configurations that run on the same machine.

Then, we turn the data over to OpenOffice Calc for graphing… <sigh…>

id RoQ video encode test, run on many successive revisions on PowerPC

Dear OpenOffice.org: I hate you so much. Sometimes you elect to plot data on a graph in a sane range and sometimes you opt to begin from 0. The latter is shown above, where the former would have been much more appropriate. As it stands, any useful data to be gathered from the visualizing the trend of the weighted moving averages is lost. And don’t even get me started (again) on your fragility. Or your atrocious software update mechanism.

Anyway, the above graph shows performance over time for the idroq-video-encode test run on various PowerPC configurations. The graph actually does reveal at least one oddity– the orange pulse wave that represents gcc 4.1.2’s performance over time. That might be worth looking into, particularly since it’s on the high part of the wave right now.

Below is the same thing, only with data collected from running h264-conformance-mv1_brcm_d which is the most computationally intensive H.264 conformance sample in the FATE suite. And it looks like I mislabeled ‘gcc 4.3.3’ as ‘gcc 4.3.2’ in both graphs. Oh well; not going through the pain of regenerating those graphs with OpenOffice now.

long H.264 conformance test, run on many successive revisions on PowerPC" title="long H.264 conformance test, run on many successive revisions on PowerPC

What to do about performance graphs going forward? I really don’t think it would be worthwhile to be able to pull graphs from the database relating to performance data over time for arbitrary tests. Most of the tests are simply too short to yield any useful trends. This would be better suited to the idea of running certain tasks less frequently. Various configurations should be made to run longevity tests on full movies encoded in various formats, once every day or 2. CPU runtime performance data (in contrast to wall clock time data) should be collected on those runs and graphed for analysis.

BTW, if anyone knows of better desktop graphing apps, do let me know. Any system, I guess. As long as it can read a CSV file and create a competent graph in a reasonable amount of time on a 2 GHz CPU, I should be able to tolerate it.

18 thoughts on “Weighted Moving Averages”

Kostya May 19, 2009 at 10:11 pm

err, man gnuplot?

Multimedia Mike Post authorMay 19, 2009 at 10:19 pm

Oh yeah, gnuplot. How could I forget the only program that can create uglier graphs than OpenOffice? :-)

Reimar May 19, 2009 at 11:33 pm

Tip for you: double-click on the graph, then click on the Y-Axis, select “Object Properties”, then the “Scale” tab, remove “Automatic” everywhere and enter whatever you’d like, or at least for Minimum/Maximum.

Reimar May 19, 2009 at 11:50 pm

Oh, I also forgot to say: you should be able to just change the label cells and the graph’s labels should be updated automatically, no “regenerating the graph” should be necessary to fix such mistakes…
Also what is it that is particularly ugly about gnuplot? You need an insane amount of options, e.g. the following still does less than your graph (plot1.dat contains the data one per line, revison number first, value next separated by comma):
set ylabel “time(ms)”
plot “plot1.dat” smooth csplines title “gcc …” with lines lw 4

Mans May 20, 2009 at 5:31 am

MATLAB can do some nice graphs, if you have it.

For the ultimate in beauty, you could probably cook up something with POVray. ;-)

Adam Ehlers Nyholm Thomsen May 20, 2009 at 12:22 pm

If you want a python library take a look at http://matplotlib.sourceforge.net/ it might suit your needs.

Multimedia Mike Post authorMay 20, 2009 at 12:46 pm

matplotlib: Now that’s more like it! Thanks for the pointer.

Multimedia Mike Post authorMay 20, 2009 at 12:59 pm

@Reimar: About “regenerating the graph”, it’s true that changes in the data are reflected on the graph immediately. However, OpenOffice offers no easy way that I can find to simply export that graph as an image. And it crashes if I try to copy it to another program, say, OpenOffice Draw. So I have to resort to using Mac OS X’s screen capture facility.

Tomer Gabel May 20, 2009 at 5:04 pm

An oddly inspiring post. Thank you :-)

Multimedia Mike Post authorMay 20, 2009 at 6:42 pm

I’m glad to be an inspiration. Which part was inspiring, though? The part where I demonstrate ignorance of basic statistical methods? Or the part with the impotent rage against frustrating open source software? :-)

Tomer Gabel May 21, 2009 at 1:42 am

A little of both, I guess. Whenever I find myself missing essential math tools (particularly statistical methods such as this) I feel a small pang of regret for not taking any formal studies, and when I manage to acquire those tools anyway I always feel a bit of exhilaration.

Your post triggered first the one, then the other, which makes it inspirational in my book… :-)

Mans May 21, 2009 at 1:43 am

For the future you could get more accurate numbers if you add the -benchmark flag to the ffmpeg command lines. Most importantly, this will give correct figures on the systems driver over ssh. These are currently reported with 0ms time since the tests take no time on the FATE host.

Multimedia Mike Post authorMay 21, 2009 at 1:01 pm

Thanks for the reminder regarding ‘-benchmark’, Mans. I remember hearing about that once but never gave it a try until today. I will remember that for the Great Refactoring.

astrange May 25, 2009 at 1:55 pm

Depending on the host, you might want to run benchmarks multiple times and choose the minimum.

Core 2 in particular is incredibly hard to benchmark on – I regularly get larger speed changes by copying the ffmpeg binary and running the two copies than I do from actual changes.

Tomer Gabel May 25, 2009 at 3:09 pm

astrange: The phenomenon you describe is very strange, can you elaborate a bit on that?

astrange May 25, 2009 at 8:20 pm

There’s some guesses about the problem here: http://x264dev.multimedia.cx/?p=51

I think it may be very affected by random alignment issues in physical memory. Also, OS X flushes the TLB on every context switch, which has a bad random effect on memory accesses; that doesn’t help.

Tomer Gabel May 26, 2009 at 1:08 am

OS X flushes the TLB… I’m not a kernel developer, but it actually makes sense as long as you don’t have shared/mapped memory areas. In other words it’s probably a reasonable optimization for a desktop OS; I wonder if Windows behaves similarly. It sounds like a server performance killer, though.

Tomer Gabel May 26, 2009 at 4:31 am

My bad, it appears this is an intrinsic property of x86 architecture, for details: http://www.stanford.edu/~stinson/paper_notes/fundamental/mem_mgmt/ia32_pts.txt

Still, as this affects all code equally, I fail to see why this would have a significant impact on benchmarking.

Comments are closed.