Category Archives: Science Projects

CPU Time Experiment

Science project: Measure how accurately Python measures the time a child process spends on the CPU.

FATE clients execute build and test programs by creating child processes. Python tracks how long a child process has been executing using one number from the 5-element tuple returned from os.times(). I observed from the beginning that this number actually seems to represent the number of times a child process has been allowed to run on the CPU, multiplied by 10ms, at least for Linux.

I am interested in performing some controlled tests to learn if this is also the case for Mac OS X. Then, I want to learn if this method can reliably report the same time even if the system is under heavy processing load and the process being profiled has low CPU priority. The reason I care is that I would like to set up periodic longevity testing that tracks performance and memory usage, but I want to run it at a lower priority so it doesn’t interfere with the more pressing build/test jobs. And on top of that, I want some assurance that the CPU time figures are meaningful. Too much to ask? That’s what this science project aims to uncover.

Methodology: My first impulse was to create a simple program that simulated harsh FFmpeg conditions by reading chunks from a large file and then busying the CPU with inane operations for a set period of time. Then I realized that there’s no substitute for the real deal and decided to just use FFmpeg.

ffmpeg -i sample.movie -y -f framecrc /dev/null

For loading down the CPU(s), one command line per CPU:

while [ 1 ]; do echo hey > /dev/null; done

I created a Python script that accepts a command line as an argument, sets the process nice level, and executes the command while taking the os.times() samples before and after.

Halfway through this science project, Mans reminded me of the existence of the ‘-benchmark’ command line option. So the relevant command becomes:

time ./science-project-measure-time.py "ffmpeg -benchmark -i sample.movie -y -f framecrc /dev/null"

Here is the raw data, since I can’t think of a useful way to graph it. The 5 columns represent:

  1. -benchmark time
  2. Python’s os.times()[2]
  3. ‘time’ real time
  4. ‘time’ user time
  5. ‘time’ sys time
Linux, Atom CPU, 1.6 GHz
========================
unloaded, nice level 0
run 1: 26.378, 26.400, 36.108, 26.470, 9.065
run 2: 26.426, 26.460, 36.103, 26.506, 9.089
run 3: 26.410, 26.440, 36.099, 26.494, 9.357

unloaded, nice level 10
run 1: 26.734, 26.760, 37.222, 26.806, 9.393
run 2: 26.822, 26.860, 36.217, 26.902, 8.945
run 3: 26.566, 26.590, 36.221, 26.662, 9.125

loaded, nice level 10
run 1: 33.718, 33.750, 46.301, 33.810, 11.721
run 2: 33.838, 33.870, 47.349, 33.930, 11.413
run 3: 33.922, 33.950, 47.305, 34.022, 11.849


Mac OS X, Core 2 Duo, 2.0 GHz
=============================
unloaded, nice level 0
run 1: 13.301, 22.183, 21.139, 13.431, 5.798
run 2: 13.339, 22.250, 20.150, 13.469, 5.803
run 3: 13.252, 22.117, 20.139, 13.381, 5.728

unloaded, nice level 10
run 1: 13.365, 22.300, 20.142, 13.494, 5.851
run 2: 13.297, 22.183, 20.144, 13.427, 5.739
run 3: 13.247, 22.100, 20.142, 13.376, 5.678

loaded, nice level 10
run 1: 13.335, 22.250, 30.233, 13.466, 5.734
run 2: 13.220, 22.050, 30.247, 13.351, 5.762
run 3: 13.219, 22.050, 31.264, 13.350, 5.798

Experimental conclusion: Well this isn’t what I was expecting at all. Loading the CPU altered the CPU time results. I thought -benchmark would be very consistent across runs despite the CPU load. My experimental data indicates otherwise, at least for Linux, which was to be in charge of this project. This creates problems for my idea of an adjunct longevity tester on the main FATE machine.

The Python script — science-project-measure-time.py — follows:

Continue reading

RAM Disk Experiment

Science project: Can FATE performance be improved — significantly or at all — by running as much of the operation as possible from RAM? My hypothesis is that it will speed up the overall build/test process, but I don’t know by how much.

Conclusion and spoiler: The RAM disk makes no appreciable performance difference. Linux’s default caching is more than adequate.

There are 4 items I am looking at storing in RAM: The FFmpeg source code, the built objects, the ccache files, and the suite of FATE samples. This experiment will deal with placing the first 3 into RAM.

Method:

  • Clear ccache and compile FFmpeg on the disk. Do this thrice and collect “wall clock” numbers using the ‘time’ command line prefix,
    e.g.:

      time `../ffmpeg/configure --prefix=install-directory -cc="ccache gcc" &&
            make && make install`
    

    The second and third runs should be faster due to Linux’s usual file caching in memory.

  • Restart the machine.
  • Perform 3 more runs using the existing cache.
  • Restart the machine.
  • Set up a 1GB RAM disk as outlined by this tutorial.
  • Copy the source tree into the RAM disk and configure ccache to use a directory on the RAM disk. Re-run the last step and collect numbers.
  • Bonus: restart the machine again and compile the source without ccache in order to measure the performance hit incurred by ccache when there are no files cached.

Hardware: MSI Wind Nettop with 1.6 GHz N330 Atom (dual-core, hyperthreaded); 2 GB of DDR2 533 RAM; 160 GB, 7200 RPM SATA HD with an ext3 filesystem. I don’t know a good way to graph this, so here are the raw numbers. The first number of each pair is wall clock time, the second is CPU time.

On disk:
run 1: 15:41, 14:32
run 2:  1:43,  1:12
run 3:  1:43,  1:12

On disk, after restart:
run 1:  1:50,  1:13
run 2:  1:42,  1:13
run 3:  1:43,  1:12

RAM disk (ext2):
run 1: 15:37, 14:35
run 2:  1:39,  1:12
run 3:  1:40,  1:13

From startup, no ccache:
run 1: 15:12, 14:12

Building from disk after a restart demonstrates that there is a difference of 8 real seconds during which all of the relevant files are read into the OS’s file cache. The run without ccache demonstrates that using ccache with no prior cache incurs a nearly 30-second penalty as the cache must be initialized.

And since I know you’re wondering, here’s what happens when I wipe the ccache and just let this thing rip with ‘make -j5’ multithreaded build:

On disk, with ccache, multithreaded:
run 1: 6:51, 24:12
run 2: 1:05, 2:18
run 3: 0:54, 1:41
run 4: 0:54, 1:40

I did 4 runs this time because I wanted to see if I saw a 4th set of numbers consistent with the 3rd.

I know these results may elicit a big “duh!” from many readers, but I still wanted to prove it to myself.