Category Archives: VP3/Theora

Google Funding VP3 On ARM

Some news is making the rounds that Google is funding ARM improvements for the Theora video decoder. It gives the free software faithful renewed hope. However, reading this news makes me wonder: Doesn’t FFmpeg already have ARM optimizations for Theora? In fact, it does, as indicated by the existence of the file libavcodec/arm/vp3dsp_neon.S. This has optimized IDCT transform/get/put and loop filter functions for NEON instruction sets. I know there are several different types of SIMD for ARM chips and I don’t know if NEON is the most common variety.

The most pressing reason for funding this effort is, of course, license purity.

One Gluttonous if Statement

I’m still haunted by the slow VP3/Theora decoder in FFmpeg. While profiling reveals that most of the decoding time is spent in a function named unpack_vlcs(), I have been informed that finer-grained profiling indicates that the vast majority of that time is spent evaluating the following if statement:

  if (coeff_counts[fragment_num] > coeff_index)

I put counters before and after that statement and ran the good old Big Buck Bunny 1080p movie through it. These numbers indicate, per frame, the number of times the if statement is evaluated and the number of times it evaluates to true:

[theora @ 0x1004000]3133440, 50643
[theora @ 0x1004000]15360, 722
[theora @ 0x1004000]15360, 720
[theora @ 0x1004000]0, 0
[theora @ 0x1004000]13888, 434
[theora @ 0x1004000]631424, 10711
[theora @ 0x1004000]2001344, 36922
[theora @ 0x1004000]1298752, 22897

Thus, while decoding the first frame, the if statement is evaluated over 3 million times but further action (i.e., decoding a coefficient) is only performed 50,000 times. Based on the above sample, each frame sees about 2 orders of magnitude more evaluations than are necessary.

Clearly, that if statement needs to go.

Continue reading