Science project: Measure how accurately Python measures the time a child process spends on the CPU.
FATE clients execute build and test programs by creating child processes. Python tracks how long a child process has been executing using one number from the 5-element tuple returned from os.times(). I observed from the beginning that this number actually seems to represent the number of times a child process has been allowed to run on the CPU, multiplied by 10ms, at least for Linux.
I am interested in performing some controlled tests to learn if this is also the case for Mac OS X. Then, I want to learn if this method can reliably report the same time even if the system is under heavy processing load and the process being profiled has low CPU priority. The reason I care is that I would like to set up periodic longevity testing that tracks performance and memory usage, but I want to run it at a lower priority so it doesn’t interfere with the more pressing build/test jobs. And on top of that, I want some assurance that the CPU time figures are meaningful. Too much to ask? That’s what this science project aims to uncover.
Methodology: My first impulse was to create a simple program that simulated harsh FFmpeg conditions by reading chunks from a large file and then busying the CPU with inane operations for a set period of time. Then I realized that there’s no substitute for the real deal and decided to just use FFmpeg.
ffmpeg -i sample.movie -y -f framecrc /dev/null
For loading down the CPU(s), one command line per CPU:
while [ 1 ]; do echo hey > /dev/null; done
I created a Python script that accepts a command line as an argument, sets the process nice level, and executes the command while taking the os.times() samples before and after.
Halfway through this science project, Mans reminded me of the existence of the ‘-benchmark’ command line option. So the relevant command becomes:
time ./science-project-measure-time.py "ffmpeg -benchmark -i sample.movie -y -f framecrc /dev/null"
Here is the raw data, since I can’t think of a useful way to graph it. The 5 columns represent:
- -benchmark time
- Python’s os.times()[2]
- ‘time’ real time
- ‘time’ user time
- ‘time’ sys time
Linux, Atom CPU, 1.6 GHz ======================== unloaded, nice level 0 run 1: 26.378, 26.400, 36.108, 26.470, 9.065 run 2: 26.426, 26.460, 36.103, 26.506, 9.089 run 3: 26.410, 26.440, 36.099, 26.494, 9.357 unloaded, nice level 10 run 1: 26.734, 26.760, 37.222, 26.806, 9.393 run 2: 26.822, 26.860, 36.217, 26.902, 8.945 run 3: 26.566, 26.590, 36.221, 26.662, 9.125 loaded, nice level 10 run 1: 33.718, 33.750, 46.301, 33.810, 11.721 run 2: 33.838, 33.870, 47.349, 33.930, 11.413 run 3: 33.922, 33.950, 47.305, 34.022, 11.849 Mac OS X, Core 2 Duo, 2.0 GHz ============================= unloaded, nice level 0 run 1: 13.301, 22.183, 21.139, 13.431, 5.798 run 2: 13.339, 22.250, 20.150, 13.469, 5.803 run 3: 13.252, 22.117, 20.139, 13.381, 5.728 unloaded, nice level 10 run 1: 13.365, 22.300, 20.142, 13.494, 5.851 run 2: 13.297, 22.183, 20.144, 13.427, 5.739 run 3: 13.247, 22.100, 20.142, 13.376, 5.678 loaded, nice level 10 run 1: 13.335, 22.250, 30.233, 13.466, 5.734 run 2: 13.220, 22.050, 30.247, 13.351, 5.762 run 3: 13.219, 22.050, 31.264, 13.350, 5.798
Experimental conclusion: Well this isn’t what I was expecting at all. Loading the CPU altered the CPU time results. I thought -benchmark would be very consistent across runs despite the CPU load. My experimental data indicates otherwise, at least for Linux, which was to be in charge of this project. This creates problems for my idea of an adjunct longevity tester on the main FATE machine.
The Python script — science-project-measure-time.py — follows: