FFmpeg is quite an amazing program. There’s a certain smugness that comes with being involved with it. That can lead to a bit of complacency followed by shock when realizing that you’re not as good as you thought you were.
That happened to me recently when I realized the official libtheora decoder is significantly more performant than FFmpeg’s Theora decoder. I suddenly wondered if this was true in any other departments, i.e., if FFmpeg is slower than other open source libraries that are dedicated to a single purpose. Why do I care? Because I started to wonder if FFmpeg would simply come to be known as the gcc of multimedia processing.
Is it good or bad to be compared to gcc in this way? Depends; gcc has its pros and cons. A colleague once succinctly summarized these trade-offs thusly: “You can generally count on gcc to generate adequate code for just about any platform.” Some free software fans grow indignant when repeated benchmarks unequivocally show, e.g., Intel’s proprietary compilers slaughtering gcc’s various versions. But what do you expect? gcc spreads the effort around to all kinds of chips while Intel largely focuses on generating code for chips that they invented and know better than anyone else. Frankly, I’ve always admired gcc for being able to do as well as it does.
But does it have to be that way with FFmpeg? “You can generally count on FFmpeg to be able to decode a particular format fairly quickly and encode to a wide variety of formats with reasonable quality.” That’s certainly the case currently regarding Theora (it can decode the format, just not as efficiently as libtheora). What about some other notable formats? I think some tests are in order.
Methodology: Take a long audio file (I had a 10m44s audio file within reach when I brainstormed this test) and encode it as MP3, AAC, Vorbis, and FLAC. Profile FFmpeg’s performance vs. other open source packages by measuring best wall-clock (real) time out of 3 consecutive runs. Run these tests on an EeePC 701 which has an Intel Celeron M running at 630 MHz. This is somewhere that performance matters. Ask all programs to decode to 16-bit PCM and dump the output directly to /dev/null while squelching as much console output as the program will allow (I don’t know how to make FFmpeg shut up entirely but ‘-v 0’ makes it not update its status along the way).
FFmpeg SVN revision 20667 was used for these tests.
FLAC:
Using FLAC 1.2.1. Command lines:
ffmpeg -v 0 -i file.flac -f s16le - > /dev/null flac --silent --stdout --decode file.flac > /dev/null
Results:
FFmpeg: 8.4s FLAC: 12.2s
Vorbis:
Using whatever is currently installed in Ubuntu 9.10. I saw some security updates were downloaded recently so I suspect the Ogg and Vorbis libraries are quite up to date. Command lines:
ffmpeg -v 0 -i file.ogg -f s16le - > /dev/null oggdec --quiet --raw --output /dev/null file.ogg
Results:
FFmpeg: 7.2s libvorbis: 12.3s
AAC:
Using FAAD v2.6. Command lines:
ffmpeg -v 0 -i file.m4a -f s16le - > /dev/null faad -q -f 2 file.m4a -o /dev/null
Results:
FFmpeg: 7.4s libfaad: 17.5s
MP3:
Using mpg123 v1.9.2 and madplay v0.15.1 (beta). Command lines:
ffmpeg -v 0 -i file.mp3 -f s16le - > /dev/null madplay --very-quiet --output=raw:/dev/null file.mp3 mpg123 -q -s file.mp3 > /dev/null
Results:
FFmpeg: 22.3s libmad: 24.2s mpg123: 9.5s
So FFmpeg fared quite well except in the case of MP3 decoding. I’m a bit surprised that FFmpeg is relatively so slow to decode MP3. As Mans discovered, sometimes FFmpeg’s deficiencies are actually gcc’s fault. However, that blog post was primarily concerned with PowerPC. I wonder if x86_32 has the same problems.
I also wanted to test XviD vs. FFmpeg’s decoder but I didn’t know how to set up that test. And if you want to know how various H.264 decoders stack up, both performance-wise and feature-wise, corner Dark Shikari and he will be pleased to regale you with the benchmarks.
So I guess that leaves FFmpeg’s Theora decoder as the biggest embarrassment. The good news is that I have some substantial optimizations planned for the very near future.
Looks like you need not to optimise it but rather fully rewrite get_vlc(), maybe based on different principle of decoding.
Doesn’t ffmpeg use interger based mp3 decoding like libmad? If so, its not surprising that an fp decoder can beat it. Faking the reals with ints is a lot slower then faking them with dedicated and fully pipelined floating point hardware.
@Kostya: Do you mean rewrite the core get_vlc() function from FFmpeg? Or VP3’s unpack_vlcs() function? If the latter, I don’t think a total rewrite is necessary. See next post.
> I also wanted to test XviD vs. FFmpeg’s decoder but I didn’t know how to set up that test.
“mplayer -benchmark” and forcing the decoder will help you do that.
Having done so for a long time during the time i maintained XviD, I can tell you that FFmpeg was a lot faster as soon as bframes were used because the IDCT stage was merged with the compensation stage. EDGE_EMU was helping too as XviD does real edging, loosing lot of cycles for each frame and consuming a lot of bandwidth.
Basically my analysis around 2005 was that XviD was memory bandwidth bound whereas FFmpeg could hit the memory bandwidth limit at much higher bitrate.
I’m still curious to see if FFmpeg has had some regression against XviD but i doubt that.
Everything MDCT (cook, aac, wma, vorbis, nellymoser, ac3, atrac, etc) and FFmpeg should beat or be comparable in performance to the alternatives. And regarding the MP3 case, our decoder is fixed-point resulting in lower performance.
> I also wanted to test XviD vs. FFmpeg’s decoder but I didn’t know how to set up that test.
>“mplayer -benchmark†and forcing the decoder will help you do that.
mplayer -benchmark -vc xvid (requires xvidcore in mplayer)
The DV video decoder in FFmpeg is notably slower than some of the binary alternatives. I don’t remember offhand how it compares to libdv, but I think it is slower. Just play around with the samples available in our samples collection.
Well it’s not needed to go that far.
H.264 decoder in regular ffmpeg still isn’t threaded (except for multiple-slice video), and in clock for clock (single thread) performance, it isn’t a champion either.
I know things don’t some without work, but I would expect it to be a priority to get multithread support for a codec that is quite cpu-intensive and very popular at the same time (imho it’s not a stretch to say it’s the one that really matters in the first place, esp. when it comes to the performance of the decoder).
Good luck with your work ;)
Another one I forgot to mention: WMV3/VC-1. The binary is considerably faster than our decoder, at least on my trusty old K6-III…
h264 decoder is already multithreaded in ffmpegmt (which will be merged, one day)
ffmpeg’s wmv3 is faster on p4. maybe the plain c version needs some speedups ? or more asm?
ffwmv3:
BENCHMARKs: VC: 6.128s VO: 7.416s A: 0.000s Sys: 0.432s = 13.976s
wmvdmo:
BENCHMARKs: VC: 13.187s VO: 7.251s A: 0.000s Sys: 0.234s = 20.672s
libavcodec/x86/vc1dsp_mmx.c shows a wide variety of MMX and MMX2 functions for intensive VC-1/WMV3. Perhaps in compn’s test, the binary was not detecting the presence of SIMD, where Diego’s test did? Or perhaps Diego’s test used a different binary?
Do note that wmvdmo might decide to apply postprocessing which will cut its speed in half or even more. It decides to apply posprocessing depending on the ratio of video size vs CPU MHz (so, for good quality videos, a faster CPU will decrease quality since the filter is too strong and will mud the video to hell). There are some registry keys to force it to certain values, I thought mplayer set them to control it explicitly but I might be wrong.
Last time I tested it (a long while ago) wmvdmo was somewhat faster. Also I got the impression wmvdmo is quite optimized for P4, at the expense of other CPUs.
I was just thinking about if “FFmpeg Doomed To Be The GCC of Multimedia?” with the variation of “You can generally count on ffmpeg to generate adequate media in just about any format.”
@Sabin: Yeah, that works too. Then the issue becomes, will FFmpeg do it quickly and with tolerable quality?