CPU Time Experiment | Breaking Eggs And Making Omelettes

Science project: Measure how accurately Python measures the time a child process spends on the CPU.

FATE clients execute build and test programs by creating child processes. Python tracks how long a child process has been executing using one number from the 5-element tuple returned from os.times(). I observed from the beginning that this number actually seems to represent the number of times a child process has been allowed to run on the CPU, multiplied by 10ms, at least for Linux.

I am interested in performing some controlled tests to learn if this is also the case for Mac OS X. Then, I want to learn if this method can reliably report the same time even if the system is under heavy processing load and the process being profiled has low CPU priority. The reason I care is that I would like to set up periodic longevity testing that tracks performance and memory usage, but I want to run it at a lower priority so it doesn’t interfere with the more pressing build/test jobs. And on top of that, I want some assurance that the CPU time figures are meaningful. Too much to ask? That’s what this science project aims to uncover.

Methodology: My first impulse was to create a simple program that simulated harsh FFmpeg conditions by reading chunks from a large file and then busying the CPU with inane operations for a set period of time. Then I realized that there’s no substitute for the real deal and decided to just use FFmpeg.

ffmpeg -i sample.movie -y -f framecrc /dev/null

For loading down the CPU(s), one command line per CPU:

while [ 1 ]; do echo hey > /dev/null; done

I created a Python script that accepts a command line as an argument, sets the process nice level, and executes the command while taking the os.times() samples before and after.

Halfway through this science project, Mans reminded me of the existence of the ‘-benchmark’ command line option. So the relevant command becomes:

time ./science-project-measure-time.py "ffmpeg -benchmark -i sample.movie -y -f framecrc /dev/null"

Here is the raw data, since I can’t think of a useful way to graph it. The 5 columns represent:

-benchmark time
Python’s os.times()[2]
‘time’ real time
‘time’ user time
‘time’ sys time

Linux, Atom CPU, 1.6 GHz
========================
unloaded, nice level 0
run 1: 26.378, 26.400, 36.108, 26.470, 9.065
run 2: 26.426, 26.460, 36.103, 26.506, 9.089
run 3: 26.410, 26.440, 36.099, 26.494, 9.357

unloaded, nice level 10
run 1: 26.734, 26.760, 37.222, 26.806, 9.393
run 2: 26.822, 26.860, 36.217, 26.902, 8.945
run 3: 26.566, 26.590, 36.221, 26.662, 9.125

loaded, nice level 10
run 1: 33.718, 33.750, 46.301, 33.810, 11.721
run 2: 33.838, 33.870, 47.349, 33.930, 11.413
run 3: 33.922, 33.950, 47.305, 34.022, 11.849


Mac OS X, Core 2 Duo, 2.0 GHz
=============================
unloaded, nice level 0
run 1: 13.301, 22.183, 21.139, 13.431, 5.798
run 2: 13.339, 22.250, 20.150, 13.469, 5.803
run 3: 13.252, 22.117, 20.139, 13.381, 5.728

unloaded, nice level 10
run 1: 13.365, 22.300, 20.142, 13.494, 5.851
run 2: 13.297, 22.183, 20.144, 13.427, 5.739
run 3: 13.247, 22.100, 20.142, 13.376, 5.678

loaded, nice level 10
run 1: 13.335, 22.250, 30.233, 13.466, 5.734
run 2: 13.220, 22.050, 30.247, 13.351, 5.762
run 3: 13.219, 22.050, 31.264, 13.350, 5.798

Experimental conclusion: Well this isn’t what I was expecting at all. Loading the CPU altered the CPU time results. I thought -benchmark would be very consistent across runs despite the CPU load. My experimental data indicates otherwise, at least for Linux, which was to be in charge of this project. This creates problems for my idea of an adjunct longevity tester on the main FATE machine.

The Python script — science-project-measure-time.py — follows:

7 thoughts on “CPU Time Experiment”

mat June 12, 2009 at 12:14 pm

It could be cache related ?
In the “loaded, nice level 10” configuration you will do lot’s of context switch and ffmpeg will got very short timeslice (your loading program don’t block for long time).

May be this pollute the CPU cache or other CPU context.
But what’s strange is X86 cache are really big and shouldn’t be impacted like that big context switch.

PS : I wonder what I happen for OS under visualization, I expect also some strange number.
Multimedia Mike Post authorJune 12, 2009 at 12:54 pm

My first pass at the experiment actually occurred under a virtualized Linux session. One interesting thing was that the sys time numbers were fairly negligible. But all of the numbers were fairly consistent regardless of load conditions or priorities.
Vitor June 13, 2009 at 1:17 pm

While not as useful as a a real benchmark, you can try valgrind –tool=callgrind
Reimar June 18, 2009 at 1:36 pm

Uh, my Atom CPU at least has HyperThreading, which means that under load CPU time can up to double (one process running on each virtual CPU while using the same CPU resources).
Multimedia Mike Post authorJune 18, 2009 at 3:07 pm

@Reimar: My atom is an N330 which is a hyperthreaded dual-core CPU. So it presents 4 CPUs to the OS, which I loaded for the test. Do you think the hyperthreading is why I got such strange results? I really should run the experiment on my Core 2 Duo machine, the one which would be doing the testing anyway.
Reimar June 19, 2009 at 1:10 am

@Mike:
depends, how did you create the background load? This kind of thing should always happen when programs run on different virtual but the same physical CPU, because it then becomes impossible to distinguish which program used how many resources.
So if you ran a single other program, it should have run on the other core (assuming a sane scheduler), and thus the differences should only be due to e.g. cache. Which with a dual-core Atom probably could still be a lot.
With more than two programs overall running, HyperThreading would have been in use and I think you can’t get any reliable benchmark numbers then in any way (mostly since HyperThreading means your CPU no longer has a fixed speed, depending on the _combination_ of programs running, the CPU might appear to each of the two programs sharing a real CPU to run at anything from normal to about half the speed, even if you could in theory I’m sure you don’t want to extract proper benchmark numbers out of that mess).
I think you should be able to disable HyperThreading somehow, Google tells me adding “noht” to the kernel options should work.
Multimedia Mike Post authorJune 19, 2009 at 8:59 am

@Reimar: As indicated in the post, I run “while [ 1 ]; do echo hey > /dev/null; done” on 4 different terminals to attempt to max out all 4 CPUs. Hopefully, the same experiment on a non-HT CPU will provide the kind of results I expect. Otherwise, I have another idea for implementing this periodic testing idea.

Comments are closed.