Category Archives: Programming

Compiler Performance Profiling With FFmpeg

A Slashdot story named High Performance Linux Kernel Project discusses an effort — called LinuxDNA — to make the Linux kernel compilable with Intel’s C compiler (icc) in an effort to hopefully create a higher performing Linux kernel. The ensuing comments showcased a lot of back and forth about whether icc actually offers any performance gains over recent (or even ancient) gcc versions (“ICC really no longer has the performance lead that it once did over gcc.”). I was also curious about the claim that “it’s well known that gcc 2.95.3 generates much better code on a lot of platforms”; is that why we continually test 2.95.3 via FATE?

Yet another comment stated that, “We tried ICC on our simulator. The result: 8% slower than GCC. On intel chips.”. That’s when I realized that I’m in a position to offer some controlled testing using a CPU-intensive application: FFmpeg. At any given time, I have access to the latest builds compiled for 20 different configurations. This includes a copy built with icc 10.1.017.

So I ran some tests. Executive summary: icc finishes neck and neck with gcc 4.1.2 (a tiny bit ahead of gcc in my test), while both put most of the rest of the compilers in the test to shame, especially the latest gcc compilers. I have a chart to back up my claims, so there:

Followup: Be sure to see the results of this same exercise run without any manual ASM/SIMD optimizations.

Compiler performance when decoding MPEG-4 video and MP3 audio with FFmpeg

Small aside: I hope you appreciate that chart. You wouldn’t believe how long it took me to coerce OpenOffice.org to create it, nor how grotesquely volatile OOo 3 is on Mac OS X. In the end, the program didn’t play ball and I had to use Mac’s screenshot feature to capture the goods for publishing.

Methodology: I took a 104-minute movie that has been encoded with ISO MPEG-4 part 2 (a.k.a. DivX/XviD) video and MP3 audio and fed it through the following command:

$ time ffmpeg -i file.avi -f framecrc – > /dev/null

I used the ‘user’ output from the time prefix (out of the real, user, and sys times) which counts the approximate seconds that the process spent on the CPU. This should exclude I/O access and, really, probably just counts the number of 10ms time slices that the OS allocates to the process. I ran the test once for each compiler configuration, then ran through the configurations a second time and graphed the minimum time between the pair of runs for each configuration.

One day, I will have graphing working in FATE so that we can obtain continuous and historical performance data that will help us analyze trends, both in FFmpeg and in the compilers that build it.

Another comment from the Slashdot thread asserted that “it is simply healthy for the kernel to be compilable across more compilers,” to which another commenter challenged, “Prove it.” Again, I think I’m in a position to help here. While it may be more common for a test to break on all PowerPC configurations due to endian considerations, or for the build to break on the icc or gcc 2.95.3 configurations for reasons related to C99 arcana, there have been a few instances where FATE tests have inexplicably broken on very specific configurations. The latest example of this is when a recent code change in FFmpeg randomly caused the wc3movie-xan test spec to fail, but only on the Linux / x86_32 / gcc 4.2.4 configuration. Huh? Well, thanks to Vitor who promptly went to work with valgrind and found that the subsystem was doing some bad things in the first place and in a way that finally manifested on one configuration. (Incidentally, I’m pretty sure that the WC3 playback system was the first bit of code I ever contributed to FFmpeg.)

See Also:

Adventures In MIPS Tools

I’m trying to compile a program that will run on my new MIPS-based subnotebook. I finally got a cross-compiling toolchain built and building a super-simple C program. But the program failed to run. When I tried to run my sample program, the shell complained about not knowing what to do with a ‘(‘ character. Puzzling.

This is the ‘file’ type of my compiled program:

ELF 32-bit MSB executable, MIPS, MIPS-I version 1 (SYSV), statically linked, not stripped

I finally thought to extract a binary from the MIPS machine and check its file type:

ELF 32-bit LSB executable, MIPS, version 1 (SYSV), dynamically linked (uses shared libs), stripped

Okay, I think I’m getting wise to the discrepancy already. It turns out that I want a target called “mipsel” rather than just plain “mips” as the former specifies little endian (and because a MIPS CPU can be wired to run either endian — that’s how simple and reduced this reduced instruction set is).

So I rebuilt the toolchain using the ‘mipsel’ target (building the toolchain is surprisingly quick when you know how). Now the test program segfaults when I try to run it. That’s unfortunate, though I still perceive it to be a step up from the last position. This is the new type reported by ‘file’:

ELF 32-bit LSB executable, MIPS, MIPS-I version 1 (SYSV), statically linked, not stripped

It’s MIPS-I version 1, vs. simply MIPS version 1, which is what the existing binaries are. I wonder if that’s the problem? I’m also struggling with a linker warning about the start location. That’s more likely to be the issue.

BTW, this is the C code I am testing with:

int main()
{
  return 77;
}

My thinking here is that I should be able to run the program followed by “echo $?” to get the last command’s exit status– 77 in this case.

Parsing In Python

I wanted to see if the video frames inside these newly discovered ACDV-AVI files were just regular JPEG frames stuffed inside an AVI file. JPEG is a picky matter and many companies have derived their own custom bastardizations of the format. So I just wanted to separate out the data frames into individual JPEG files and see if they could be decoded with other picture viewers. Maybe FFmpeg can already do it using the right combination of command line options. Or maybe it’s trivial to hook up the ‘ACDV’ FourCC to the JPEG decoder in the source code. What can I say? FFmpeg intimidates me just as much as it does any of you mere mortals.

Plus, I’m getting a big kick out of writing little tools in Python. For a long time, I had a fear of processing binary data in very high level languages like Perl, believing that they should be left to text processing tasks. This needn’t be the case. pack() and unpack() make binary data manipulation quite simple in Perl and Python. Here’s a naive utility that loads an AVI file in one go, digs through it until it finds a video frame marker (either ’00dc’ or — and I have never seen this marker before — ’00AC’) and writes the frame to its own file.

acdv.py:

BTW, the experiment revealed that, indeed, the ACDV video frames can each stand alone as separate JPEG files.

The Downside Of Contributions

The prolific Jeff Atwood has a blog post entitled Don’t Go Dark which describes the issue of programmers retreating into their chambers for months on end to create the perfect feature; at the conclusion, said programmers drop the feature on the community at large hoping for its immediate and wholesale incorporation into the project’s mainline codebase. As you can imagine, FFmpeg lends itself well to this style of lone-wolf development. Unfortunately, it also conflicts with FFmpeg’s level of code maturity which necessitates that every line of code be carefully scrutinized before it is allowed possible immortality in the mainline tree. This leads to a tremendous amount of orphaned patches. Should FFmpeg maintain such a strict policy? Personally, I agree with the project leader in his position that, if the changes are not made before inclusion, the changes will likely never be made.

There’s another angle that I don’t think was addressed by Jeff’s post. It’s a problem we saw repeatedly on the xine project. Companies who were doing things like set-top media boxes were understandably eager to incorporate xine’s superior — and fee-free – media playback architecture. Naturally, it took some… tweaking and customizations (read: ad-hoc hacks) in order to get the stuff to work just right with a specific setup, and within a deadline. When the project was complete, an engineer would drop a mega-patch with all of their changes to the xine codebase, as mandated by the GNU GPL. And it was quite useless to us.

I’m not sure what to do about the latter case. With the former, it is useful to at least anticipate developing your module in somewhat bite-sized phases that can perhaps be incorporated in separate patches.