Simple YUV Coding Formats by Mike Melanson (mike at multimedia.cx) v1.1: December 3, 2004 ======================================================================= NOTE: The information in this document is now maintained in Wiki format at: http://wiki.multimedia.cx/index.php?title=ATI_VCR1 http://wiki.multimedia.cx/index.php?title=Cirrus_Logic_AccuPak http://wiki.multimedia.cx/index.php?title=Creative_YUV http://wiki.multimedia.cx/index.php?title=Video_XL ======================================================================= Copyright (c) 2004 Mike Melanson Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". Contents -------- * Introduction * ATI VCR1 * Cirrus Logic AccuPak (CLJR) * Creative YUV (CYUV) * Miro/Pinnacle Video XL (VIXL/PIXL) * References * ChangeLog * GNU Free Documentation License Introduction ------------ There are many ways to code, store and transport YUV video data. This file documents various simple methods used for coding such data. A working knowledge of YUV colorspaces is assumed in this document. For more information about YUV basics, see the references as the end of this document. About terminology: YUV is the same as YCbCr for the purposes of this document. Y represents luminance values. U = Cb represents blue chrominance values. V = Cr represents red chrominance values. This document will also sometimes refer to the U and V samples collectively as C samples. ATI VCR1 -------- The ATI VCR1 codec, identified by the fourcc VCR1, uses differential coding to pack Y samples. C samples are left alone. VCR1 is based on a YUV 4:1:0 colorspace. This means that for each block of 4x4 pixels each pixel has a Y sample and the entire block shares both C samples. The format of a VCR1-encoded video chunk is as follows: bytes 0-31 16 16-bit, signed, little-endian deltas used in this frame bytes 32.. encoded YUV data The deltas are apparently 16 bits in width which is somewhat irrelevant since the Y samples to which they are applied are only 8-bit numbers. The YUV data is coded after the initial deltas. The data is coded as: luminance/chrominance line luminance line luminance line luminance line [...] Every fourth line, starting with line 0, contains both luminance (Y) and chrominance (C) data. The other lines only contain Y data. Each Y/C line begins with 4 offsets to be used when decoding the Y data for the next 4 lines: byte 0 offset for this line's Y data byte 1 offset for second line's Y data byte 2 offset for third line's Y data byte 3 offset for fourth line's Y data bytes 4.. Y/C data For the remainder of the data on a Y/C line, these 6 pieces of data: Y0 Y1 Y2 Y3 U V are encoded within groups of 4 bytes of the bytestream. Y0..Y3 are the next 4 Y samples in the line while U and V are the C samples for the 4 Y samples as well as the 4 Y samples on each of the next 3 lines (since this is a YUV 4:1:0 colorspace). The 4 bytes in the group have the following meaning: byte0 byte1 byte2 byte3 Y3i Y2i V Y1i Y0i U Bytes 1 and 3 correspond to the V and U samples, respectively. Bytes 0 and 2 break down into 4 4-bit nibbles which do not actually represent the Y samples. Instead, they index into the delta table from the start of the frame. The indexed signed delta is applied to this line's Y offset. For example, Y0 = offset + delta_table [ byte2 & 0x0F ] Y3 = offset + delta_table [ byte0 >> 4 ] For the other lines that only contain Y data, each group of 4 bytes decodes to 8 Y samples in a similar manner as on the Y/C lines: byte0 byte1 byte2 byte3 Y5i Y4i Y7i Y6i Y1i Y0i Y3i Y2i Cirrus Logic AccuPak (CLJR) --------------------------- The Cirrus Logic AccuPak codec, identified by the fourcc CLJR, packs 4 Y samples and 2 C samples into 32 bits by representing each Y sample with 5 bits and each C sample with 6 bits. It is essentially a scaled-down method of coding YUV 4:1:1, where each group of 4 pixels on a line is represented by a luminance sample each but share C samples. Each set of 32 bits represents 4 pixels on a line: p0 p1 p2 p3 For each set of 32 bits, read left -> right: p3.Y = next 5 bits p2.Y = next 5 bits p1.Y = next 5 bits p0.Y = next 5 bits Cb/U = next 6 bits Cr/V = next 6 bits ------- 32 bits Thus, the first 5 bits represent the Y sample for the last pixel in the group of 4 pixels. Creative YUV (CYUV) ------------------- Creative YUV, identified by the fourcc CYUV, uses differential coding to effectively compress each Y, U, and V sample to 4 bits with some overhead at the start of each line. The codec operates on a YUV 4:1:1 colorspace which means that each group of 4 pixels on a line has 1 Y sample per pixel, but only 1 of each C sample for the entire group. A chunk of CYUV-encoded data is laid out as: bytes 0-15 signed Y predictor byte values bytes 16-31 signed U predictor byte values bytes 32-47 signed V predictor byte values bytes 48.. lines of CYUV-encoded data The format of each line is as follows: byte 0 bits 7-4 initial U sample and predictor for line bits 3-0 initial Y sample and predictor for line byte 1 bits 7-4 initial V sample and predictor for line bits 3-0 next Y predictor index byte 2 bits 7-4 next Y predictor index bits 3-0 next Y predictor index bytes 3.. remaining predictor indices for line The first 3 bytes contain the setup information for the line. Each initial sample (Y, U, and V) actually represents the top 4 bits of the initial 8-bit sample. The initial sample also serves as the initial predictor. For each of the 3 Y predictor indices, use the 4-bit value to index into the table of 16 Y predictors, encoded at the start of the frame. Apply each predictor to the previous Y value. At this point, the first group of 4 pixels will be decoded. For each group of 4 pixels remaining on the line byte 0 bits 7-4 next U predictor index bits 3-0 next Y predictor index byte 1 bits 7-4 next V predictor index bits 3-0 next Y predictor index byte 2 bits 7-4 next Y predictor index bits 3-0 next Y predictor index For each predictor index, use the 4 bits to index into the appropriate predictor table and apply the predictor to the previous sample of the same type (Y, U, or V) and output the sample. Miro/Pinnacle Video XL (VIXL/PIXL) ---------------------------------- The Miro Video XL codec, identified by the fourcc VIXL, uses differential coding on a reduced-precision YUV 4:1:1 colorspace image. Each Y, U, or V component is only 7 bits (where 8 is more typical). Each group of 32 bits in the bitstream represents 6 5-bit delta table indices (with 2 unused bits). There is one index for each of the next 4 Y samples on the line and one index for each of the color samples. The Pinnacle Video XL codec, indentified by the fourcc PIXL, is apparently the same algorithm as the Miro codec except that the frames are 8 bytes longer. However, the same decoding process applies. For each block of 4 pixels on a line, fetch the next 32 bits as a little endian number and then swap the 16 bit words to achieve the correct bit orientation for decoding. To illustrate more clearly, this is the arrangement of the next 4 8-bit bytes (A, B, C, and D) on disk: aaaaaaaa bbbbbbbb cccccccc dddddddd Load the 4 bytes into a program variable so that the bytes are in this order: dddddddd cccccccc bbbbbbbb aaaaaaaa Then, swap the upper and lower 16-bit words to achieve this order: 31 0 bbbbbbbb aaaaaaaa dddddddd cccccccc Further, the 32-bit blocks are stored in reverse order. So, for example, if an image is 16 pixels wide, it would have 4 pixel groups per line. Each pixel group would be represented by a 32-bit doubleword, swapped and mangled as described previously. The doublewords would be stored in the bytestream as: D3 D2 D1 D0 D0 represents the first 4 pixels on the line and D3 represents the final 4 pixels on the line. Thus, a decoder must jump forward in the bytestream and work backwards through the bytestream while decoding in the forward direction on a particular line, then jump forward again in the bytestream when decoding the next line. The 32 bits of the doubleword represent the following values: bit 31: unused bits 30-26: V delta index bits 25-21: U delta index bits 20-16: Y3 delta index bit 15: unused bits 14-10: Y2 delta index bits 9-5: Y1 delta index bits 4-0: Y0 delta index Each delta index value is used to index into this table and the referenced value is added to the previous element on the same plane, either Y, U, or V: const int xl_delta_table[32] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 15, 20, 25, 34, 46, 64, 82, 94, 103, 108, 113, 116, 119, 120, 121, 122, 123, 124, 125, 126, 127 }; Remember that the YUV components only have 7 bits of precision. Thus, the second half of the table values all count as negative values. At the beginning of a line, the Y0, U, and V delta indices actually represent the top 5 bits of the absolute 7-bit component value. The final, concise decoding algorithm operates as follows: foreach line in image foreach 32-bit doubleword, working from right -> left in bytestream load doubleword as little-endian number, swap 16-bit words if this is the first pixel group in line next Y value = (Y0 delta index) << 2 next U value = (U delta index) << 2 next V value = (V delta index) << 2 else next Y value = last Y value + xl_delta_table[Y0 delta index] next U value = last U value + xl_delta_table[U delta index] next V value = last V value + xl_delta_table[V delta index] next Y value = last Y value + xl_delta_table[Y1 delta index] next Y value = last Y value + xl_delta_table[Y2 delta index] next Y value = last Y value + xl_delta_table[Y3 delta index] Since the components only have 7 bits of meaningful precision, it will likely be necessary to shift each of the components left once more to achieve 8 bits of output precision. References ---------- Multimedia Technology Basics (with an introduction to YUV) http://www.multimedia.cx/mmbasics.txt ffmpeg project http://ffmpeg.sourceforge.net/ Creative YUV Format http://www.csse.monash.edu.au/~timf/videocodec/cyuv.txt ChangeLog --------- v1.1: December 3, 2004 - Miro/Pinnacle Video XL (VIXL/PIXL) v1.0: September 23, 2004 - initial release GNU Free Documentation License ------------------------------ see http://www.gnu.org/licenses/fdl.html