I’m trying to come to terms with the reality that is XML. I may not like the format but that won’t change the fact that I have to interoperate with various XML data formats already in the wild. In other words, treat it like any random multimedia format. For example, suppose I want to write software to interpret the various comics that I’ve created with Taco Bell’s series of Comics Constructors CD-ROMs.
Category Archives: Python
CPU Time Experiment
Science project: Measure how accurately Python measures the time a child process spends on the CPU.
FATE clients execute build and test programs by creating child processes. Python tracks how long a child process has been executing using one number from the 5-element tuple returned from os.times(). I observed from the beginning that this number actually seems to represent the number of times a child process has been allowed to run on the CPU, multiplied by 10ms, at least for Linux.
I am interested in performing some controlled tests to learn if this is also the case for Mac OS X. Then, I want to learn if this method can reliably report the same time even if the system is under heavy processing load and the process being profiled has low CPU priority. The reason I care is that I would like to set up periodic longevity testing that tracks performance and memory usage, but I want to run it at a lower priority so it doesn’t interfere with the more pressing build/test jobs. And on top of that, I want some assurance that the CPU time figures are meaningful. Too much to ask? That’s what this science project aims to uncover.
Methodology: My first impulse was to create a simple program that simulated harsh FFmpeg conditions by reading chunks from a large file and then busying the CPU with inane operations for a set period of time. Then I realized that there’s no substitute for the real deal and decided to just use FFmpeg.
ffmpeg -i sample.movie -y -f framecrc /dev/null
For loading down the CPU(s), one command line per CPU:
while [ 1 ]; do echo hey > /dev/null; done
I created a Python script that accepts a command line as an argument, sets the process nice level, and executes the command while taking the os.times() samples before and after.
Halfway through this science project, Mans reminded me of the existence of the ‘-benchmark’ command line option. So the relevant command becomes:
time ./science-project-measure-time.py "ffmpeg -benchmark -i sample.movie -y -f framecrc /dev/null"
Here is the raw data, since I can’t think of a useful way to graph it. The 5 columns represent:
- -benchmark time
- Python’s os.times()[2]
- ‘time’ real time
- ‘time’ user time
- ‘time’ sys time
Linux, Atom CPU, 1.6 GHz ======================== unloaded, nice level 0 run 1: 26.378, 26.400, 36.108, 26.470, 9.065 run 2: 26.426, 26.460, 36.103, 26.506, 9.089 run 3: 26.410, 26.440, 36.099, 26.494, 9.357 unloaded, nice level 10 run 1: 26.734, 26.760, 37.222, 26.806, 9.393 run 2: 26.822, 26.860, 36.217, 26.902, 8.945 run 3: 26.566, 26.590, 36.221, 26.662, 9.125 loaded, nice level 10 run 1: 33.718, 33.750, 46.301, 33.810, 11.721 run 2: 33.838, 33.870, 47.349, 33.930, 11.413 run 3: 33.922, 33.950, 47.305, 34.022, 11.849 Mac OS X, Core 2 Duo, 2.0 GHz ============================= unloaded, nice level 0 run 1: 13.301, 22.183, 21.139, 13.431, 5.798 run 2: 13.339, 22.250, 20.150, 13.469, 5.803 run 3: 13.252, 22.117, 20.139, 13.381, 5.728 unloaded, nice level 10 run 1: 13.365, 22.300, 20.142, 13.494, 5.851 run 2: 13.297, 22.183, 20.144, 13.427, 5.739 run 3: 13.247, 22.100, 20.142, 13.376, 5.678 loaded, nice level 10 run 1: 13.335, 22.250, 30.233, 13.466, 5.734 run 2: 13.220, 22.050, 30.247, 13.351, 5.762 run 3: 13.219, 22.050, 31.264, 13.350, 5.798
Experimental conclusion: Well this isn’t what I was expecting at all. Loading the CPU altered the CPU time results. I thought -benchmark would be very consistent across runs despite the CPU load. My experimental data indicates otherwise, at least for Linux, which was to be in charge of this project. This creates problems for my idea of an adjunct longevity tester on the main FATE machine.
The Python script — science-project-measure-time.py — follows:
icc vs. gcc Smackdown, Round 3
How did I become the benchmark peon? Oh right, I actually dared to put forth some solid benchmarks and called for suggestions for possible improvements to the benchmark methodology. This is what I get.
Doing these benchmarks per all the suggestions I have received is time-consuming and error-prone. But if you know anything about me by now, you should know that I like automating time-consuming and error-prone tasks. This problem is looking more and more like a nail, so allow me to apply my new favorite hammer: Python!
Here’s the pitch: Write a Python script that iterates through a sequence of compiler configurations, each with its own path and unique cflags, and compiles FFmpeg. For each resulting build, decode a long movie twice, tracking the execution time in milliseconds. Also, for good measure, follow Reimar’s advice and validate that the builds are doing the right thing. To this end, transcode the first 10 seconds of the movie to a separate, unique file for later inspection. After each iteration, write the results to a CSV file for graphing.
And here’s the graph:
Look at that! gcc 4.3.2 still isn’t a contender but gcc 4.4-svn is putting up a fight.
Here are the precise details of this run:
- Movie file is the same as before: 104-minute AVI; ISO MPEG-4 part 2 video (a.k.a. DivX/XviD) at 512×224, 24 fps; 32 kbps, 48 kHz MP3
- This experiment includes gcc 4.4.0-svn, revision 143046, built on 2009-01-03 (I’m a bit behind)
- All validations passed
- Machine is a Core 2 Duo, 2.13 GHz
- All 8 configurations are compiled with –disable-amd3dnow –disable-amd3dnowext –disable-mmx –disable-mmx2 –disable-sse –disable-ssse3 –disable-yasm
- icc configuration compiled with –cpu=core2 –parallel
- gcc 4.3.2 and 4.4.0-svn configurations compiled with -march=core2 -mtune=core2
- all other gcc versions compiled with no special options
See Also:
What’s in store for round 4? It sure would be nice to get icc 11.0 series working on my machine for once to see if it can do any better. And since I have the benchmark framework, it would be nice to stuff LLVM in there to see how it stacks up. I would also like to see how the various builds perform when decoding H.264/AAC. The problem with that is the tremendous memory leak that slows execution to a crawl during a lengthy transcode. Of course I would be willing to entertain any suggestions you have for compiler options in the next round.
Better yet, perhaps you would like to try out the framework yourself. As is my custom, I like to publish my ad-hoc Python scripts here on my blog or else I might never be able to find them again.
Camp Luna
I remember when the Mono people first announced the Moonlight project for Linux that would interoperate with Microsoft’s Silverlight. They claimed that Microsoft would release a special binary codec pack that would allow Linux users to play back their proprietary media codecs. However, this codec pack would not be allowed for use in any other application, like FFmpeg or GStreamer. How are they going to enforce that? Or so I wondered. Tonight I learned how.
I started investigating the API of the binary codec pack blobs a few weeks ago. I got as far as figuring out how Moonlight registers the codecs. Then I lost motivation, in no small part because there isn’t that much in the blob that I would deem interesting (perhaps one method for keeping people from sorting out the API). In the comments of the last post on the matter, people wondered if the codec pack included support for WMA Voice, which is still unknown. I can’t find any ‘voice’ strings in the blob. However, I do find references to lossless coding. This might pertain to Windows Lossless Audio, or it could just be a special coding mode for WMA3 Pro. Either way, I’m suddenly interested.
So I looked for interface points in the Moonlight source. Moonlight simply loads and invokes registration functions for WMA, WMV, and MP3. The registration functions don’t return any data that Moonlight stores. Moonlight doesn’t appear to load (via dlsym()) or invoke any other codec pack functions directly. So how can it possibly be interfacing? The only other way the interaction could flow is if the codec pack shared library was invoking functions in Moonlight…
Oh, no… they wouldn’t do that, would they?