When I was banging my head hard on FFmpeg’s VP3/Theora VLC decoder, I kept wondering if there’s a better way to do it. Libtheora shows us the way in that there obviously must be, since it can decode so much faster than FFmpeg.
In the course of brainstorming better VLC approaches, I wondered about the concept of parallelizing the decoding operation somehow. This sounds awfully crazy, I know, and naturally, it’s intractable to directly parallelize VLC decoding since you can’t know where item (n+1) begins in the bitstream until you have decoded item (n). I thought it might be plausible to make a pass through the bitstream decoding all of the tokens and ancillary information into fixed lenth arrays and then make a second pass in order to sort out the information in some kind of parallelized manner. Due to the somewhat context-adaptive nature of Theora’s VLCs, I don’t know if I can make a go of that.
Why can’t VLC decoding be done in parallel, either through SIMD or thread-level parallelization? I think the answer is clearly, “because the codecs weren’t designed for it”. Here’s the pitch: Create a codec bitstream format where the frame header contains indices pointing into the bitstream. For example, Sorenson Video 1 encodes the Y, U, and V planes in order and could be effectively decoded in parallel if only the frame header encoded offsets to the start of the U and V planes. Similarly, if a Theora frame header encoded offsets to certain coefficient groups in the header, that could facilitate multithreaded coefficient VLC decoding. Some other modifications would need to occur in the encoder, e.g., guaranteeing that no end-of-block runs cross coefficient group boundaries as they often do now.
Taking this a step farther, think about arithmetic/range coding. Decoding a bitstream encoded with such an algorithm notoriously requires a multiplication per value decoded. How about designing the bitstream in such a way that 2 or 4 multiplications can be done in parallel through any number of SIMD instruction sets?
Do these ideas have any plausibility? I have this weird feeling that if they do, they’re probably already patented. If they aren’t, well, maybe this post can serve as prior art.