VQ Case Study: RoQ

RoQ was first developed for the FMV-based adventure game The 11th hour and was later adopted by Id for the Quake III engine and derivative games.

RoQ operates in a YUV 4:2:0 space. However, it was developed for a game released in the late 1994/early 1995 timeframe. Back then, cutting edge video was 640×480 at 256 colors or maybe 64K colors. and it was not feasible to take a large video frame and convert the entire thing from YUV -> RGB 30, 24, or even 15 times per second. However, RoQ’s design solved some of these problems.

The first RoQ frame contains a full vector codebook. The codebook contains 2×2 pixel vectors and 4×4 pixel vectors. The remainder of the encoded stream contains codes for tiling these vectors in order to reconstruct an image. Further, interframes can take advantage of motion compensation. Frames also have the option of transmitting new vectors to be used in the codebook.

An important design decision is that the vectors are decoded into the codebook in advance of decoding and can be converted to any format necessary. I.e., if the graphic hardware demanded RGB555 or RGB565, as was probably the original assumption for the codec, the decoder can convert all the vectors from YUV 4:2:0 to the appropriate colorspace during the file initialization. Then it can simply tile using these vectors without concerning itself with further colorspace conversion unless replacement vectors show up in successive frames.

However, since it is a proper YUV colorspace, FFmpeg just decodes and outputs the raw YUV 4:2:0 data.

Wired ran a portion of the codec’s creator’s diary (one Graeme Devine) that describes how imperative it was to squeeze performance out of the encoder. When you have a lot of video to encode and you wish to meet a shipping deadline, AND you are encoding using VQ, performance tweaks count:

A technical Daily Report. Sorry, it’s the way I’m thinking right now.

Well, I didn’t think that I could meet my goal of knocking a second out of drawing 100 dense quad structures, but I did. Twice. Tonight it’s down to 5.7 seconds. A total of two seconds faster.

What does this mean? A dense quad is around 10 times larger than a typical frame chunk quad – it contains essentially lossless data. These dense quads make good profiling data, since they overload the overhead in the decoder nicely.

Most of the speedups came by fixing 486 CPU stalls. The 486 will “stall” and wait for a cycle if the previous machine instruction is somehow required for the next. Fixing stalls is tricky. Writing stall-free code is almost impossible. You have to think below the assembly-code level and think about how the CPU is dealing with data.

An observation regarding the inversely proportional complexities of the encoder and the decoder:

Verrry, verry tricky right now. The encoder is now getting as complex as a MPEG encoder, which is making decoding simpler (as opposed to MPEG). Basically cutting down drastically on memory usage per frame for the same rez frame.

VQ houses at the time were obsessive about getting ahold of the fastest hardware on the market at the time (things like high-end 486 computers or very early Pentiums):

Encoding is going slowly because if we add machines it crashes. Those super-fast HP systems would make a real difference right now – we need some number crunchers online asap!

…and…

Encoding video is one of the few problems you can throw fast hardware at and get results.

Today we got in 3 HP 735/125 systems. They run fast, not just fast in fact, but stinkingly disgustingly fast. Around three times the speed of a 90-MHz Pentium, probably faster if they were not network and disk i/o bound. It’s the fastest piece of hardware I’ve ever used. Between the Envoy and the 735, this has been a great week for new toys. The 735 machines will allow 11h to ship much sooner.

And here’s the part where he must have added the code to convert the YUV 4:2:0 vectors to RGB24 when unpacking, along with the appropriate copying code:

Tonight I added true 24-bit support for 640-by-320-by-30fps at 24-bit from a 150K/second drive. Of course, you’ll need a Pentium system for this to work, but it looks unbelievable. 24-bit support gives us better-than-TV color depth and a picture that finally looks non-blocky, non-pixelly, and about as good as it gets. This makes The 11th Hour the first game with true full-motion video software playback at 30fps and the first 24-bit fmv on a home-class computer.

After all the R&D, these were the final specs decided for the shipping version of The 11th Hour:

At first we supported all the way from 32-bit color down to 8-bit color, but, after surveying the video cards sold, installed bases, and how the game looks, we’re only going to support those with a 16-bit or 24-bit DAC at 640-by-320 or 320-by-160.