Quake II Cinematics Uncovered

This article was originally presented by the QDevels project on the Planet Quake web site.

Introduction

To improve the single player story side of Quake, Id Software has now included cut scene cinematics in Quake II. Several people have since been interested in how to create their own cinematics and have discovered a program released by Id Software in their public source dump.

This document attempts to describe the format of a Quake II cinematic sequence (a .cin file) and include some source code for encoding (taken from Id Softwares source) and decoding of the cinematic sequences. I will try and keep it simple enough for non-technical people to follow.

In essence, the Quake II cinematics are an AVI sequence where the audio is stored in a raw pcm format, and the 8-bit colour lookup table based video is coded using a two-pass loss-less static Huffman coder. I will go into more detail in the following sections.

The supplied `bin_nt/qdata.exe' by Id Software

The program supplied by Id Software in their public source code dump allows you to easily create .cin cinematic files. There has been information supplied by Jeff Garstecki (stecki@frag.com and http://www.frag.com/deconstruct) (and a user made .cin sequence) and Paul Steed (psteed@idsoftware.com). I will briefly re-cap their documentation, and go into a little bit more detail.

The cinematic sequences are stored in the `quake2/baseq2/video' directory where they can be played from the console using the map command (try typing `map end.cin' from the console).

To create your own sequences, generate a series of individual frames of your animation sequence and save them as 8-bit colour PCX files. The file names should be numbered sequentially as [base name]000.pcx, or [base name]0000.pcx, (for example: hell000.pcx, hell001.pcx, ... hell120.pcx) although qdata can start at any frame. These files need to be located in the `/bin_nt/video/[base name]' directory (in the example: /bin_nt/video/hell/).

Although you can have different colour palettes for your sequences, there will be an improvement in video quality if the frames share a common colour palette, or if the colour palette is only changed during a black frame. This is due to slow palette switching times. A suggestion is to fade to black, switch palettes, and fade to the new palette. This palette switching can be seen in the ntro.cin sequence where it is used several times. When adding PCX images, qdata checks to see if the palette has changed and adds a change palette command to the sequence.

Technically, the frames can be of any resolution, however, the standard resolution used is 320x240. I have tried sequence resolutions of 336x240, 176x144 and 360x288 and found that frames which are too large or small are scaled to fill the screen. Animations are played at 14 frames per second, and Quake II will skip frames to maintain this playback rate on slow or heavily loaded machines (see the section on `Audio Coding' for the 14 fps derivation). Frame skipping is used to prevent sound from becoming choppy.

An optional sound file can be included in the animation sequence. The source sound must be in a .wav format, can be mono or stereo, can be a multiple of 8-bit per sample (usually 8 or 16-bit) and can technically have an arbitrary sampling rate (typically 22050Hz or 11025Hz). The file must be placed in the same directory as the PCX files, and have the same [base name] as the PCX files (in our example: `/bin_nt/video/hell/hell.wav').

Finally, a QDT script file (.qdt) needs to be created with the following information in it:


   $video [base name]  [no. of digits (3 or 4)] [start frame (optional)]

In our example, we create the file hell.qdt with:


   $video hell 3

The .qdt file is placed in the /bin_nt directory, and qdata is run using the .qdt file as its only argument. (for example qdata hell.qdt). After a few passes, a resulting .cin file will be created in the `/bin_nt/video/' directory which can be viewed using Quake II.

Video Coding

If you venture into the source code of `qdata.exe' distributed in Id Software's public source dump, you will find the file `utils3/qdata/video.c'. In this file, it can be seen that Id tried several techniques to code their video (including a few Huffman techniques and an LZ technique) before settling on a two-pass static loss-less Huffman coder.

In the area of image, video and audio storage, there are three techniques to reduce file sizes: lossless coding, lossy coding and sub-sampling. Lossless techniques compress data without loss to the audio or visual quality, however, obtain very low compression ratios resulting in large files. Lossy techniques, however, sacrifice some audio or video quality not perceivable by humans, in return for significantly higher compression ratios. The third way of reducing storage requirements is in the same vein as lossy compression and is done through sub-sampling. For video, this includes pixel, spatial and temporal sub-sampling in the form of quantising the pixel colours to produce a smaller colour palette (eg: 256 colours rather than 16.7k colours), lower screen resolutions, and lower video frame rates (15 frames per second (fps) rather than 25 or 30 fps) respectively.

Id Softwares cinematic video sequences use two of the three forms of compression: sub-sampling and lossless coding. Video sequences are firstly sub-sampled to 8-bit per pixel (256 colours), 320x240 pixel frames at 14 frames per second. The resulting sequence is then lossless coded using the Huffman algorithm to achieve approximately 3:1 reduction from the sub-sampled sequence. This format would have probably been used due to the minimum platform specification in which the video is conveyed: on a PC with a 256 colour display, relatively slow (P90) processor, and a cheap mass storage device (CD rom).

If, and most probably when (point release maybe??), Id increase their minimum platform to 24-bit colour and a slightly faster processor, they could use a lossy technique at 16.7k colours, over twice the frame rate and a significant improvement in compression. The improvement in colour, spatial and temporal resolution would greatly out-weigh the loss through coding. An example of this is a sequence converted from .cin format to MPEG. The file idlog.cin plays with 8-bit colour at 14 fps and is compressed at 2.3:1. The same file encoded using MPEG ( ftp://ftp.cdrom.com:/pub/idgames2/quake2/graphics/movies/idlog_avi.zip) is played with 24-bit colour at 25 fps, and is compressed to approximately 13:1. The MPEG, as is expected, takes significantly more processing power to play back in real time when compared to the .cin format. Other forms of less processor demanding lossy compression not experimented with include Quicktime and AVI incorporating codecs such as CinePak and Indeo Video. See the results section for more .cin sequence compression results.

Huffman Coding

As stated by Peter Gutmann in the comp.compression FAQ:

`Huffman compression is a statistical data compression technique which gives a reduction in the average code length used to represent the symbols of a alphabet.'

In Huffman's coding technique, stored pixel data is assigned variable length codes (VLC) based on the pixel's probability of occurrence. Input pixels that occur more often are assigned shorter length codes (a fewer number of bits), while infrequent input pixels are assigned longer length codes (a greater number of bits). A static Huffman coder achieves this by performing two passes over the video sequence. The first pass creates a frequency histogram of the pixels in the video sequence, using it to generate the dictionary of VLCs. The histogram is stored so that the decoder can reconstruct the VLC dictionary. The second pass over the sequence pixels stores the VLC that corresponds to each input pixel. You may need to look else where for a more in depth discussion on Huffman coding.

Typically, a histogram of 256 elements is used when constructing the VLC dictionary, one histogram entry per pixel value. However, video sequence images contain a high inter-pixel correlation in the spatial domain (pixels next to one another are very similar or the same in colour), and a significant improvement in compression performance can be achieved if both the previous pixel and the current pixel are used when generating the frequency histogram. This is the case with Id Software's .cin video format. The result is 256 histograms of 256 elements producing a 256 * 256 table. The rows of the histogram are referenced by the previous pixel, and the columns of the histogram are referenced by the current pixel. Since there is a high probability of the previous pixel being the same as, or very similar to the current pixel, a diagonal line from the top-left corner to the bottom-right corner is formed in the histogram indicating areas of high probability. See the included image of the two dimensional histogram.


idlog.cin histogram	ntro.cin histogram

When decoding a sequence, the previously decoded pixel is used to reference a row of the VLC dictionary, while the stored variable length code is used to find the pixel value. This new pixel value then becomes the previous pixel, and the process is repeated. The initial `previous pixel' value is set to zero for the start of each frame.

If you are interested in the video coding of .cin files, most of what has been said should be clearer if you look at the supplied source code.

Audio Coding

Audio data in the .cin cinematic sequences is stored in a raw pcm format (uncompressed). From the sequence header, it appears that any sampling rate, sample size and number of channels can be used, however, it would depend on what combinations of parameters the game can play back. From the results section below, it can be seen that sequences have used sampling rates of 22050 and 11025 Hz, sample widths of 8 or 16 bits and either mono (1 channel) or stereo (2 channels). Acoustically demanding sequences (speech, sound effects and music) such as the intro and end sequence have used a higher quality stereo audio, while the less demanding (just speech and simple sound effects) cut scenes have used lower quality mono audio.

When audio is coded into the cinematic sequence, a one second clip of audio data (sample rate * sample width * sample channels) is divided into 14 chunks. Each of these chunks is assigned to one frame of the Huffman coded video. This audio segmentation is found in the source code supplied by Id Software, and will result in a 14 frames per second video play back rate to synchronise with the audio. More information on the sequence format is found below.

Coding Results

Some results taken from both the included cinematic sequences, and a user made sequence (cave.cin by Jeff Garstecki) are as follows:



+-----------+---------+-------+-----+------+--------+-----------+-------+
| sequence  | vid res | rate  | wid | chan | frames | file size | compr |
+-----------+---------+-------+-----+------+--------+-----------+-------+
| ntro.cin  | 320x240 | 22050 | 16  |  2   | 2945   | 82836235  | 3.5:1 |
| end.cin   | 320x240 | 22050 | 16  |  2   |  726   | 19311290  | 3.7:1 |
| idlog.cin | 320x240 | 22050 | 16  |  2   |   81   |  3159828  | 2.3:1 |
| eou#_.cin | 320x240 | 11025 | 8   |  1   |    -   |        -  |     - |
| cave.cin  | 320x240 | 22050 | 16  |  2   |  200   |  5453415  | 3.7:1 |
+-----------+---------+-------+-----+------+--------+-----------+-------+

Where `rate', `wid' and `chan' are the audio sampling rate, sample width and number of channels respectively. The `compr' is the compression obtained in the video only. From these results it can be seen that sequences with smooth coloured areas (ntro.cin, end.cin and cave.cin) result in compression ratios of around 3.6:1. However, highly textured sequences such as the idlog.cin (its background) result in lower compression.

Coded Cinematic Stream

This section describes the very simple and application specific .cin file structure. The .cin file contains a header in little endian format as follows:



        32 ...... 2 1 0      Field Name                    Type
       +---------------+
    0  |               |     Video width                   Unsigned long
       +---------------+
    4  |               |     Video height                  Unsigned long
       +---------------+
    8  |               |     Audio sample rate             Unsigned long
       +---------------+
   12  |               |     Audio sample width (in bytes) Unsigned long
       +---------------+
   16  |               |     Audio channels (1 or 2)       Unsigned long
       +---------------+
   20  |               |
       +-             -+
   24  |               |
       +-   . . . .   -+
       |               |     Huffman table                 Unsigned Byte
       +-             -+
65556  |               |
       +---------------+

This header contains information on the video and audio resolution, as well as a Huffman table used to code the video data. The Huffman table is a 256 * 256 table of byte values (65536 bytes total).

Following the header, and for each frame of the video, the following is stored in the .cin sequence:



        32 ...... 2 1 0      Field Name                    Type
       +---------------+
    0  |               |     Sequence command              Unsigned long
       +---------------+
    4  |               |
       +-             -+
    8  |               |
       +-   . . . .   -+
       |               |     OPTIONAL colour palette       Unsigned Byte
       +-             -+
       |               |
       +---------------+
  772  |               |     Huffman count                 Unsigned long
       +---------------+
  776  |               |     Decode count (D)              Unsigned long
       +---------------+
  780  |               |
       +-             -+
  784  |               |
       +-   . . . .   -+     Encoded Huffman video data    Unsigned Byte
       |               |  (contains Huffman count - 4 bytes)
       +-             -+
D+784  |               |
       +---------------+
D+788  |               |
       +-             -+
D+792  |               |     Raw audio data                Unsigned Byte
       +-   . . . .   -+  (contains
       |               |    audio width * audio channels * audio rate/14
       +-             -+      bytes)
       |               |
       +---------------+

As can be seen, the sequence stores one frame of video, and one sample of audio per frame. The above sequence command takes on three possible values:

0x0002 - Indicates end of file and no other data follows it.
0x0001 - Indicates the optional colour palette is included, followed by the video and audio data.
0x0000 - No colour palette is included, just the video and audio data.

The Huffman count indicates the number of coded bytes to follow (including the decode count), and as expected, the decode count is video width * video height.

Source Code

I have put together a small program for playing .cin files under X11. I have tested the compilation under Linux and SunOS. Most other X based OS should work. Click HERE for the source archive.

Article by Tim Ferguson.