No sooner did I press “Publish” on my last post pertaining to multithreading FFmpeg’s Theora decoder, than did I receive an email from the theora-dev list regarding the release of Theora 1.1.0. It took them many, many years to release the first official version and about 10 months to get the second version out, so congratulations on that matter. This release includes the much-vaunted Thusnelda encoder which is supposed to offer substantial encoding improvements vs. the original 1.0 encoder.
So, fair warning: Be prepared for a new round of “Theora Bests H.264 / HTML5 Video Poised To Conquer Internet” type of stories.
Since I have been doing a bunch of optimizations to the FFmpeg Theora decoder this past week (a.k.a. the Theora decoder that most people actually use), I thought this would be the perfect opportunity to benchmark Theora 1.1 alongside FFmpeg’s decoder. Fortunately, libtheora has an example tool called dump_video that decodes video directly to YUV4MPEG2 format, the same way I was testing FFmpeg’s decoder.
FFmpeg command line:
ffmpeg -threads n -i big_buck_bunny_1080p_stereo.ogg -f yuv4mpegpipe -an -y /dev/null
Libtheora command line:
dump_video big_buck_bunny_1080p_stereo.ogg > /dev/null
The results (on my Core 2 Duo Mac Mini) were thus:
6:44 - FFmpeg, 1 thread 6:09 - FFmpeg, 2 threads * 4:51 - libtheora 1.1
* multithreaded version isn’t complete yet
Mind you, libtheora’s decoder is singly-threaded and only has basic MMX SIMD optimizations. After seeing libtheora’s relative performance, I think I blacked out. Or maybe I just went to bed since it was so late; it’s sort of a blur. I awoke in a confused stupor wondering what I’m doing wrong in the FFmpeg Theora decoder. Why is it so slow? Actually, I know why– unpack_vlcs(), which continues to dominate profiling statistics. Perhaps the question I should start with is, how does libtheora unpack VLCs so quickly? That’s a good jumping-off point for a future investigation.