Category Archives: FATE Server

New H.264 Tests In FATE

I have just activated 18 new test specs for FATE. One is a test for one variant of the newly supported DPX format. The other 17 are for various samples in the fidelity range extension suite of test vectors, an extension of H.264 that FFmpeg has supported for some time. It should be noted that more samples from this suite should be forthcoming as soon as I finish downloading the whole thing (something I thought I had done a long time ago).

Perceptual Video Testing

While I have thought a lot about the problem of testing perceptual audio decoders in FFmpeg via FATE, I have put little thought into testing perceptual video decoders. I surmise that my 1-off testing method for perceptual audio decoders is probably not tractable for video decoders. I suspect that, while rounding errors also cause off-by-1 errors in video decoders, compounded errors during multiple frames could stretch the threshold. There’s also the minor matter that it would require a much larger amount of space to store raw video data vs. the audio data required for the perceptual audio tests.

Testing video decoders isn’t such a big deal, though. The FFmpeg devs have had this problem solved via the ‘make test’ regression suite for a long time. There are a few flags to tack onto a command line which will invoke a bit exact inverse discrete cosine transform (IDCT). The trade-off is that optimized IDCT code paths are not exercised.

I’m thinking of a 2-part solution:

  1. As a first pass, enter video decoding tests that leverage the bit exact IDCT.
  2. As a second pass, craft special test cases that decode one frame of video and leverage the default, optimized IDCT. Test the result using the 1-off method.

See Also:

Madcow In FFmpeg

Suxel Drol continues his relentless quest to add all Electronics Arts multimedia formats to FFmpeg. The most recent fruit of this labor is the Madcow video decoder.


Need For Speed III: Hot Pursuit -- Madcow video in VLC

The test spec is ready to go, as is another test spec for a new demuxer named QCP.

Speaking of that QCP format, that comes courtesy of Dark Shikari, toiling away at his new gig over at Facebook. Facebook, he informs us, might just be the largest user of FFmpeg. They have quite a few members uploading random multimedia files which presents just as formidable a challenge as Picsearch. QCP is just one of the (formerly) unknown formats encountered.

Oh, and I also found a way to fix that pva-demux test that has been failing for over a week. We should have a greener baseline now.

CPU Time Experiment

Science project: Measure how accurately Python measures the time a child process spends on the CPU.

FATE clients execute build and test programs by creating child processes. Python tracks how long a child process has been executing using one number from the 5-element tuple returned from os.times(). I observed from the beginning that this number actually seems to represent the number of times a child process has been allowed to run on the CPU, multiplied by 10ms, at least for Linux.

I am interested in performing some controlled tests to learn if this is also the case for Mac OS X. Then, I want to learn if this method can reliably report the same time even if the system is under heavy processing load and the process being profiled has low CPU priority. The reason I care is that I would like to set up periodic longevity testing that tracks performance and memory usage, but I want to run it at a lower priority so it doesn’t interfere with the more pressing build/test jobs. And on top of that, I want some assurance that the CPU time figures are meaningful. Too much to ask? That’s what this science project aims to uncover.

Methodology: My first impulse was to create a simple program that simulated harsh FFmpeg conditions by reading chunks from a large file and then busying the CPU with inane operations for a set period of time. Then I realized that there’s no substitute for the real deal and decided to just use FFmpeg.

ffmpeg -i sample.movie -y -f framecrc /dev/null

For loading down the CPU(s), one command line per CPU:

while [ 1 ]; do echo hey > /dev/null; done

I created a Python script that accepts a command line as an argument, sets the process nice level, and executes the command while taking the os.times() samples before and after.

Halfway through this science project, Mans reminded me of the existence of the ‘-benchmark’ command line option. So the relevant command becomes:

time ./science-project-measure-time.py "ffmpeg -benchmark -i sample.movie -y -f framecrc /dev/null"

Here is the raw data, since I can’t think of a useful way to graph it. The 5 columns represent:

  1. -benchmark time
  2. Python’s os.times()[2]
  3. ‘time’ real time
  4. ‘time’ user time
  5. ‘time’ sys time
Linux, Atom CPU, 1.6 GHz
========================
unloaded, nice level 0
run 1: 26.378, 26.400, 36.108, 26.470, 9.065
run 2: 26.426, 26.460, 36.103, 26.506, 9.089
run 3: 26.410, 26.440, 36.099, 26.494, 9.357

unloaded, nice level 10
run 1: 26.734, 26.760, 37.222, 26.806, 9.393
run 2: 26.822, 26.860, 36.217, 26.902, 8.945
run 3: 26.566, 26.590, 36.221, 26.662, 9.125

loaded, nice level 10
run 1: 33.718, 33.750, 46.301, 33.810, 11.721
run 2: 33.838, 33.870, 47.349, 33.930, 11.413
run 3: 33.922, 33.950, 47.305, 34.022, 11.849


Mac OS X, Core 2 Duo, 2.0 GHz
=============================
unloaded, nice level 0
run 1: 13.301, 22.183, 21.139, 13.431, 5.798
run 2: 13.339, 22.250, 20.150, 13.469, 5.803
run 3: 13.252, 22.117, 20.139, 13.381, 5.728

unloaded, nice level 10
run 1: 13.365, 22.300, 20.142, 13.494, 5.851
run 2: 13.297, 22.183, 20.144, 13.427, 5.739
run 3: 13.247, 22.100, 20.142, 13.376, 5.678

loaded, nice level 10
run 1: 13.335, 22.250, 30.233, 13.466, 5.734
run 2: 13.220, 22.050, 30.247, 13.351, 5.762
run 3: 13.219, 22.050, 31.264, 13.350, 5.798

Experimental conclusion: Well this isn’t what I was expecting at all. Loading the CPU altered the CPU time results. I thought -benchmark would be very consistent across runs despite the CPU load. My experimental data indicates otherwise, at least for Linux, which was to be in charge of this project. This creates problems for my idea of an adjunct longevity tester on the main FATE machine.

The Python script — science-project-measure-time.py — follows:

Continue reading