Author Archives: Multimedia Mike

Compiler Performance Profiling With FFmpeg

A Slashdot story named High Performance Linux Kernel Project discusses an effort — called LinuxDNA — to make the Linux kernel compilable with Intel’s C compiler (icc) in an effort to hopefully create a higher performing Linux kernel. The ensuing comments showcased a lot of back and forth about whether icc actually offers any performance gains over recent (or even ancient) gcc versions (“ICC really no longer has the performance lead that it once did over gcc.”). I was also curious about the claim that “it’s well known that gcc 2.95.3 generates much better code on a lot of platforms”; is that why we continually test 2.95.3 via FATE?

Yet another comment stated that, “We tried ICC on our simulator. The result: 8% slower than GCC. On intel chips.”. That’s when I realized that I’m in a position to offer some controlled testing using a CPU-intensive application: FFmpeg. At any given time, I have access to the latest builds compiled for 20 different configurations. This includes a copy built with icc 10.1.017.

So I ran some tests. Executive summary: icc finishes neck and neck with gcc 4.1.2 (a tiny bit ahead of gcc in my test), while both put most of the rest of the compilers in the test to shame, especially the latest gcc compilers. I have a chart to back up my claims, so there:

Followup: Be sure to see the results of this same exercise run without any manual ASM/SIMD optimizations.

Compiler performance when decoding MPEG-4 video and MP3 audio with FFmpeg

Small aside: I hope you appreciate that chart. You wouldn’t believe how long it took me to coerce OpenOffice.org to create it, nor how grotesquely volatile OOo 3 is on Mac OS X. In the end, the program didn’t play ball and I had to use Mac’s screenshot feature to capture the goods for publishing.

Methodology: I took a 104-minute movie that has been encoded with ISO MPEG-4 part 2 (a.k.a. DivX/XviD) video and MP3 audio and fed it through the following command:

$ time ffmpeg -i file.avi -f framecrc – > /dev/null

I used the ‘user’ output from the time prefix (out of the real, user, and sys times) which counts the approximate seconds that the process spent on the CPU. This should exclude I/O access and, really, probably just counts the number of 10ms time slices that the OS allocates to the process. I ran the test once for each compiler configuration, then ran through the configurations a second time and graphed the minimum time between the pair of runs for each configuration.

One day, I will have graphing working in FATE so that we can obtain continuous and historical performance data that will help us analyze trends, both in FFmpeg and in the compilers that build it.

Another comment from the Slashdot thread asserted that “it is simply healthy for the kernel to be compilable across more compilers,” to which another commenter challenged, “Prove it.” Again, I think I’m in a position to help here. While it may be more common for a test to break on all PowerPC configurations due to endian considerations, or for the build to break on the icc or gcc 2.95.3 configurations for reasons related to C99 arcana, there have been a few instances where FATE tests have inexplicably broken on very specific configurations. The latest example of this is when a recent code change in FFmpeg randomly caused the wc3movie-xan test spec to fail, but only on the Linux / x86_32 / gcc 4.2.4 configuration. Huh? Well, thanks to Vitor who promptly went to work with valgrind and found that the subsystem was doing some bad things in the first place and in a way that finally manifested on one configuration. (Incidentally, I’m pretty sure that the WC3 playback system was the first bit of code I ever contributed to FFmpeg.)

See Also:

The Women Of Webhosting

Some of you may have noticed that the various websites hosted here on multimedia.cx were having a tad bit of difficulty recently. Long story short: My previous web host was having some serious problems and I decided it was time to ditch them and move on to a better one. Fortunately, I had (and continue to maintain) consistent, automated backups of everything hosted on multimedia.cx. But I really wasn’t looking forward to the task of finding a new provider. Whenever I have studied web host providers in the past, they all seem pretty much the same– offering UNLIMITED EVERYTHING!! along with perfect uptime and reliability for next to nothing(*** see details below in 5-point font). And most of their websites boast a design style reminiscent of the worst e-marketing sites and guaranteed to annoy the utilitarian, tech-savvy geek.

When it boils right down to it, I think I was being asked to make a decision regarding a new web host based on the female smiling at me on the front page. Honestly, these photos were generally the only distinguishing feature among the various services:


Miss 1&1
Miss 1&1 Hosting
Miss midPhase
Miss midPhase
Miss Fast Domain
Miss FastDomain

Continue reading

Camp Luna

I remember when the Mono people first announced the Moonlight project for Linux that would interoperate with Microsoft’s Silverlight. They claimed that Microsoft would release a special binary codec pack that would allow Linux users to play back their proprietary media codecs. However, this codec pack would not be allowed for use in any other application, like FFmpeg or GStreamer. How are they going to enforce that? Or so I wondered. Tonight I learned how.

I started investigating the API of the binary codec pack blobs a few weeks ago. I got as far as figuring out how Moonlight registers the codecs. Then I lost motivation, in no small part because there isn’t that much in the blob that I would deem interesting (perhaps one method for keeping people from sorting out the API). In the comments of the last post on the matter, people wondered if the codec pack included support for WMA Voice, which is still unknown. I can’t find any ‘voice’ strings in the blob. However, I do find references to lossless coding. This might pertain to Windows Lossless Audio, or it could just be a special coding mode for WMA3 Pro. Either way, I’m suddenly interested.

So I looked for interface points in the Moonlight source. Moonlight simply loads and invokes registration functions for WMA, WMV, and MP3. The registration functions don’t return any data that Moonlight stores. Moonlight doesn’t appear to load (via dlsym()) or invoke any other codec pack functions directly. So how can it possibly be interfacing? The only other way the interaction could flow is if the codec pack shared library was invoking functions in Moonlight…

Oh, no… they wouldn’t do that, would they?

Continue reading

ARM On FATE Is A Reality

Thanks to Måns for modifying the FATE script in a way that supports automatically cross compiling FFmpeg for a different target CPU on a faster host machine, transferring the binary to a machine specimen that runs the target CPU in question, and remotely asking the target CPU to run the battery of FATE test specs. The upshot of all of this is that FATE is effectively running on an ARM-equipped Beagle Board and contributing results back that anyone can view via the main FATE page.

I hope to get his changes rolled into the main script soon. It’s great work, and I’m hard-pressed to name another continuous integration system that can operate on such diverse platforms, environments, and circumstances.

I dusted off my old Sega Dreamcast this evening — the one I used to do homebrew programming on — and enjoyed some games. As I was playing, I realized that the next evolution of FATE would be to get it to continuously run automatic cross-compile and test cycles on the Dreamcast’s SH-4 via a custom serial protocol, similar to what John Koleszar described in this comment.

But I have a few more FFmpeg code paths to cover before I can even think about that.