Cinepak (CVID) stream format for AVI and QT ------------------------------------------- Dr. Tim Ferguson, 2001. The Cinepak codec is a relatively old coding technique that is still infrequently used today. Its advantage comes from computational simplicity at the decoder, rather than bit rate versus quality performance. This codec is basically a vector quantiser with adaptive vector density. Each frame is segmented into 4x4 pixel blocks, and each block is coded using either 1 or 4 vectors. We label these coding types as follows: V1 - one vector per block, V4 - four vectors per block. Each of the V1 and V4 coding types reference separate vector codebooks, which we label V1 codebook and V4 codebook respectively. These codebooks contain a maximum of 256 entries each. A frame is also segmented into variable sized strips. A strip defines an area of the frame defined with dimensions less than or equal to those of the frame. Each strip defines its own pair of unique vector codebooks. A frame can be coded using either 8 bits per pixel (bpp), or 12 bpp. In 12 bpp mode, each codebook vector contains four eight bit luminance values and two sub-sampled eight bit chrominance values: +----+----+ +---+ +---+ | y0 | y1 | | u | | v | +----+----+ +---+ +---+ | y2 | y3 | +----+----+ In 8 bpp mode, the codebooks only contain the four luminance values. Conversion from the RGB colour space to the Cinepak colour space is achieved using the following simple matrix multiplication: | r | | 1.0 0.0 2.0 | | y | | g | = | 1.0 -0.5 -1.0 | | u | | b | | 1.0 2.0 0.0 | | v | Taking the inverse of the 3x3 matrix gives us: | y | | 0.2857 0.5714 0.1429 | | r | | u | = | -0.1429 -0.2857 0.4286 | | g | | v | | 0.3571 -0.2857 -0.0714 | | b | This is clearly different from the standard techniques for colour space conversion and was probably chosen due to its mathematical simplicity rather than its perceptual performance. As stated earlier, a 4x4 pixel block may be coded using one eight bit vector labeled V1, or four eight bit vectors labeled V4. These vectors reference the V1 and V4 codebooks respectively. In the case of a V1 coded block, the single codebook vector is used to code the block as follows: +----+----+----+----+ +---+---+ +---+---+ | y0 | y0 | y1 | y1 | | u | u | | v | v | +----+----+----+----+ +---+---+ +---+---+ | y0 | y0 | y1 | y1 | | u | u | | v | v | +----+----+----+----+ +---+---+ +---+---+ | y2 | y2 | y3 | y3 | +----+----+----+----+ | y2 | y2 | y3 | y3 | +----+----+----+----+ For a V4 coded block, four codebook table entries are used to code the block. These are four vector references (r0, r1, r2, r3) are applied to the block as follows: +------+------+------+------+ +-----+-----+ +-----+-----+ | r0y0 | r0y1 | r1y0 | r1y1 | | r0u | r1u | | r0v | r1v | +------+------+------+------+ +-----+-----+ +-----+-----+ | r0y2 | r0y3 | r1y2 | r1y3 | | r2u | r3u | | r2v | r3v | +------+------+------+------+ +-----+-----+ +-----+-----+ | r2y0 | r2y1 | r3y0 | r3y1 | +------+------+------+------+ | r2y2 | r2y3 | r3y2 | r3y3 | +------+------+------+------+ A typical frame of a Cinepak video sequence is made up of the following parts: +-----------------------+ | Frame Header | +-----------------------+ | Strip 1 Header | +-----------------------+ | Strip 1 Codebooks | +-----------------------+ | Strip 1 Frame Vectors | +-----------------------+ | Strip 2 Header | +-----------------------+ | Strip 2 Codebooks | +-----------------------+ | Strip 2 Frame Vectors | +-----------------------+ | Strip 3 Header | +-----------------------+ | . . . | . . . | . . . | +-----------------------+ Each of these parts are described in more detail. All multi-byte values are in most significant byte (MSB) ordering (ie: Motorola order). Therefore, byte swapping is required on Intel based machines. --------------- Frame Header --------------- Each frame of the Cinepak video sequence starts with a header, defined as follows: 7 6 5 4 3 2 1 0 Field Name Type +---------------+ 0 | | | Flags Byte +---------------+ 1 | | Length of CVID data Unsigned +- -+ 2 | | +- -+ 3 | | +---------------+ 4 | | Width of coded frame Unsigned +- -+ 5 | | +---------------+ 6 | | Height of coded frame Unsigned +- -+ 7 | | +---------------+ 8 | | Number of coded strips Unsigned +- -+ 9 | | +---------------+ Flags - Bit 0 of the flags field specifies weather or not the codebooks for each of the strips uses the codebook defined in the previous strip. For the first strip of a frame, the previous strip would be found in the previous frame. Length - This field specifies the total number of bytes in the frame. Width - The pixel width of the frame. Height - The pixel height of the frame. Number of Strips - The total number of strips used to code the frame. --------------- Strip Header --------------- The total number of strips for a frame is defined in the frame header. Each of these strips starts with a strip header which is defined as follows: 7 6 5 4 3 2 1 0 Field Name Type +---------------+ 0 | | Strip CVID ID Unsigned +- -+ 1 | | +---------------+ 2 | | Size of strip data Unsigned +- -+ 3 | | +---------------+ 4 | | Strips top Y position Unsigned +- -+ 5 | | +---------------+ 6 | | Strips top X position Unsigned +- -+ 7 | | +---------------+ 8 | | Strips bottom Y position Unsigned +- -+ 9 | | +---------------+ 10 | | Strips bottom X position Unsigned +- -+ 11 | | +---------------+ Strip ID - This ID takes on one of two values: 0x1000 - Intra-coded strip. 0x1100 - Inter-coded strip. Size - The total number of bytes used to code the strip. This includes bytes for the codebook definitions and the code vectors. Strips X and Y positions - These four values define the area of the frame for which the strip is defined. --------------- CVID Chunk --------------- Following the strip header, each strip is made up of a sequence of chunks, as show: 7 6 5 4 3 2 1 0 Field Name Type +---------------+ 0 | | CVID Chunk ID Unsigned +- -+ 1 | | +---------------+ 2 | | Size of chunk data (N) Unsigned +- -+ 3 | | +---------------+ 4 | | +- -+ 5 | | +- . . . . -+ | | Chunk data (N - 4 bytes) Byte +- -+ N | | +---------------+ A chunk starts with an identification number, followed by the number of bytes in the chunk. There are several chunk types, which are listed as follows: CVID Chunk ID - Intra-coded frames: 0x2000 - List of blocks in 12 bit V4 codebook 0x2200 - List of blocks in 12 bit V1 codebook 0x2400 - List of blocks in 8 bit V4 codebook 0x2600 - List of blocks in 8 bit V1 codebook 0x3000 - Vectors used to encode a frame 0x3200 - List of blocks from only the V1 codebook Inter-coded frames: 0x2100 - Selective list of blocks to update 12 bit V4 codebook 0x2300 - Selective list of blocks to update 12 bit V1 codebook 0x2500 - Selective list of blocks to update 8 bit V4 codebook 0x2700 - Selective list of blocks to update 8 bit V1 codebook 0x3100 - Selective set of vectors used to encode a frame Following the chunk ID and size is the chunk data. The format of this data depends on the chunk ID. These are described in the following sections. --------------------------------------------------------------------- Intra list of codebook blocks (IDs 0x2000, 0x2200, 0x2400, 0x2600) --------------------------------------------------------------------- This chunk contains a list of codebook entries. Each byte represents one colour component value. In the 12 bpp mode (0x2000 and 0x2200) each six bytes represents one codebook entry, starting at vector zero: 7 6 5 4 3 2 1 0 Field Name Type +---------------+ 0 | | Luminance value 0 Byte +---------------+ 1 | | Luminance value 1 Byte +---------------+ 2 | | Luminance value 2 Byte +---------------+ 3 | | Luminance value 3 Byte +---------------+ 4 | | U Chrominance value Byte +---------------+ 5 | | V Chrominance value Byte +---------------+ 6 | . | . . In 8 bpp mode (0x2400 and 0x2600), four bytes (luminance values) define each codebook entry. The total number of codebook entries defined in the chunk depends on the chunk size (size/6 or size/4 for 12 bpp and 8 bpp respectively). --------------------------------------------------------------------- Inter selective list of library blocks (IDs 0x2100, 0x2300, 0x2500, 0x2700) --------------------------------------------------------------------- In inter-frames (or non key-frames) vectors from the previous frame may be reused for the current frame. Vectors which cannot be reused may be modified using this chunk as follows: 7 6 5 4 3 2 1 0 Field Name Type +---------------+ 0 | | Update Flags Unsigned +- -+ 1 | | +- -+ 2 | | +- -+ 3 | | +---------------+ 4 | | Luminance value 0 Byte +---------------+ 5 | | Luminance value 1 Byte +---------------+ 6 | | Luminance value 2 Byte +---------------+ 7 | | Luminance value 3 Byte +---------------+ 8 | | U Chrominance value Byte +---------------+ 9 | | V Chrominance value Byte +---------------+ 10 | . | . | . | +---------------+ | . | Update Flags Unsigned +- . -+ . Update Flags - Each bit indicates whether a codebook entry is updated or not. If the bit is one, the codebook entry is replaced by the next 6 or 4 bytes (depending on the mode), otherwise the entry position is left unchanged. --------------------------------------------------------------------- Vectors used to encode a frame (ID 0x3000) --------------------------------------------------------------------- Initially, four bytes are read from the chunk which define a set of flags. One set of flags encodes 32 blocks using one bit for each block. Each of the 32 one bit flags (starting with the most significant bit in the flags variable) define which coding technique the block is represented by. If the flag is one, then the block is coded as a V4 (four vectors = four bytes), otherwise the block is coded as a V1 (one vector = one byte). After 32 blocks have been parsed, another four bytes must be read from the chunk and used as the next set of flags. --------------------------------------------------------------------- Selective set of vectors used to encode a frame (ID 0x3100) --------------------------------------------------------------------- In inter-frame coding, not all of the blocks in a frame require updating. As in the previous chunk, four bytes represent a set of flags, however in this case one of three coding choices is made for each block. Given the flag bits, the block will be: 0 = the block is skipped, 10 = V1 coded block, 11 = V4 coded block. That is, if the current flag bit is zero, the block will be skipped. If however it is one, then the following bit will determine which of the two coding types is used (simple form of variable length coding (VLC)). --------------------------------------------------------------------- List of blocks from only the V1 codebook (ID 0x3200) --------------------------------------------------------------------- All blocks coded by this chunk type are represented by V1 vectors. That is, each byte in this chunk represents one vector per block. --------------------------------------------------------------------- This document was written by Dr. Tim Ferguson, 2001. For more details, on this and other codecs, including source code, visit: http://www.csse.monash.edu.au/~timf/videocodec.html To contact me, email: timf@csse.monash.edu.au ---------------------------------------------------------------------