Simple YUV Coding Formats
by Mike Melanson (mike at multimedia.cx)
v1.1: December 3, 2004
=======================================================================
NOTE: The information in this document is now maintained in Wiki format
at:
http://wiki.multimedia.cx/index.php?title=ATI_VCR1
http://wiki.multimedia.cx/index.php?title=Cirrus_Logic_AccuPak
http://wiki.multimedia.cx/index.php?title=Creative_YUV
http://wiki.multimedia.cx/index.php?title=Video_XL
=======================================================================
Copyright (c) 2004 Mike Melanson
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled "GNU
Free Documentation License".
Contents
--------
* Introduction
* ATI VCR1
* Cirrus Logic AccuPak (CLJR)
* Creative YUV (CYUV)
* Miro/Pinnacle Video XL (VIXL/PIXL)
* References
* ChangeLog
* GNU Free Documentation License
Introduction
------------
There are many ways to code, store and transport YUV video data. This file
documents various simple methods used for coding such data.
A working knowledge of YUV colorspaces is assumed in this document. For
more information about YUV basics, see the references as the end of this
document.
About terminology: YUV is the same as YCbCr for the purposes of this
document. Y represents luminance values. U = Cb represents blue
chrominance values. V = Cr represents red chrominance values. This
document will also sometimes refer to the U and V samples collectively as
C samples.
ATI VCR1
--------
The ATI VCR1 codec, identified by the fourcc VCR1, uses differential
coding to pack Y samples. C samples are left alone. VCR1 is based on a YUV
4:1:0 colorspace. This means that for each block of 4x4 pixels each pixel
has a Y sample and the entire block shares both C samples.
The format of a VCR1-encoded video chunk is as follows:
bytes 0-31 16 16-bit, signed, little-endian deltas used in this frame
bytes 32.. encoded YUV data
The deltas are apparently 16 bits in width which is somewhat irrelevant
since the Y samples to which they are applied are only 8-bit numbers.
The YUV data is coded after the initial deltas. The data is coded as:
luminance/chrominance line
luminance line
luminance line
luminance line
[...]
Every fourth line, starting with line 0, contains both luminance (Y) and
chrominance (C) data. The other lines only contain Y data.
Each Y/C line begins with 4 offsets to be used when decoding the Y data
for the next 4 lines:
byte 0 offset for this line's Y data
byte 1 offset for second line's Y data
byte 2 offset for third line's Y data
byte 3 offset for fourth line's Y data
bytes 4.. Y/C data
For the remainder of the data on a Y/C line, these 6 pieces of data:
Y0 Y1 Y2 Y3 U V
are encoded within groups of 4 bytes of the bytestream. Y0..Y3 are the
next 4 Y samples in the line while U and V are the C samples for the 4 Y
samples as well as the 4 Y samples on each of the next 3 lines (since
this is a YUV 4:1:0 colorspace). The 4 bytes in the group have the
following meaning:
byte0 byte1 byte2 byte3
Y3i Y2i V Y1i Y0i U
Bytes 1 and 3 correspond to the V and U samples, respectively. Bytes 0
and 2 break down into 4 4-bit nibbles which do not actually represent
the Y samples. Instead, they index into the delta table from the start
of the frame. The indexed signed delta is applied to this line's Y
offset. For example,
Y0 = offset + delta_table [ byte2 & 0x0F ]
Y3 = offset + delta_table [ byte0 >> 4 ]
For the other lines that only contain Y data, each group of 4 bytes
decodes to 8 Y samples in a similar manner as on the Y/C lines:
byte0 byte1 byte2 byte3
Y5i Y4i Y7i Y6i Y1i Y0i Y3i Y2i
Cirrus Logic AccuPak (CLJR)
---------------------------
The Cirrus Logic AccuPak codec, identified by the fourcc CLJR, packs 4 Y
samples and 2 C samples into 32 bits by representing each Y sample with 5
bits and each C sample with 6 bits. It is essentially a scaled-down method
of coding YUV 4:1:1, where each group of 4 pixels on a line is represented
by a luminance sample each but share C samples.
Each set of 32 bits represents 4 pixels on a line:
p0 p1 p2 p3
For each set of 32 bits, read left -> right:
p3.Y = next 5 bits
p2.Y = next 5 bits
p1.Y = next 5 bits
p0.Y = next 5 bits
Cb/U = next 6 bits
Cr/V = next 6 bits
-------
32 bits
Thus, the first 5 bits represent the Y sample for the last pixel in the
group of 4 pixels.
Creative YUV (CYUV)
-------------------
Creative YUV, identified by the fourcc CYUV, uses differential coding to
effectively compress each Y, U, and V sample to 4 bits with some overhead
at the start of each line. The codec operates on a YUV 4:1:1 colorspace
which means that each group of 4 pixels on a line has 1 Y sample per
pixel, but only 1 of each C sample for the entire group.
A chunk of CYUV-encoded data is laid out as:
bytes 0-15 signed Y predictor byte values
bytes 16-31 signed U predictor byte values
bytes 32-47 signed V predictor byte values
bytes 48.. lines of CYUV-encoded data
The format of each line is as follows:
byte 0
bits 7-4 initial U sample and predictor for line
bits 3-0 initial Y sample and predictor for line
byte 1
bits 7-4 initial V sample and predictor for line
bits 3-0 next Y predictor index
byte 2
bits 7-4 next Y predictor index
bits 3-0 next Y predictor index
bytes 3.. remaining predictor indices for line
The first 3 bytes contain the setup information for the line. Each initial
sample (Y, U, and V) actually represents the top 4 bits of the initial
8-bit sample. The initial sample also serves as the initial predictor. For
each of the 3 Y predictor indices, use the 4-bit value to index into the
table of 16 Y predictors, encoded at the start of the frame. Apply each
predictor to the previous Y value.
At this point, the first group of 4 pixels will be decoded. For each group
of 4 pixels remaining on the line
byte 0
bits 7-4 next U predictor index
bits 3-0 next Y predictor index
byte 1
bits 7-4 next V predictor index
bits 3-0 next Y predictor index
byte 2
bits 7-4 next Y predictor index
bits 3-0 next Y predictor index
For each predictor index, use the 4 bits to index into the appropriate
predictor table and apply the predictor to the previous sample of the same
type (Y, U, or V) and output the sample.
Miro/Pinnacle Video XL (VIXL/PIXL)
----------------------------------
The Miro Video XL codec, identified by the fourcc VIXL, uses
differential coding on a reduced-precision YUV 4:1:1 colorspace image.
Each Y, U, or V component is only 7 bits (where 8 is more typical). Each
group of 32 bits in the bitstream represents 6 5-bit delta table indices
(with 2 unused bits). There is one index for each of the next 4 Y
samples on the line and one index for each of the color samples.
The Pinnacle Video XL codec, indentified by the fourcc PIXL, is
apparently the same algorithm as the Miro codec except that the frames
are 8 bytes longer. However, the same decoding process applies.
For each block of 4 pixels on a line, fetch the next 32 bits as a little
endian number and then swap the 16 bit words to achieve the correct bit
orientation for decoding. To illustrate more clearly, this is the
arrangement of the next 4 8-bit bytes (A, B, C, and D) on disk:
aaaaaaaa bbbbbbbb cccccccc dddddddd
Load the 4 bytes into a program variable so that the bytes are in this
order:
dddddddd cccccccc bbbbbbbb aaaaaaaa
Then, swap the upper and lower 16-bit words to achieve this order:
31 0
bbbbbbbb aaaaaaaa dddddddd cccccccc
Further, the 32-bit blocks are stored in reverse order. So, for example,
if an image is 16 pixels wide, it would have 4 pixel groups per line.
Each pixel group would be represented by a 32-bit doubleword, swapped
and mangled as described previously. The doublewords would be stored in
the bytestream as:
D3 D2 D1 D0
D0 represents the first 4 pixels on the line and D3 represents the final
4 pixels on the line. Thus, a decoder must jump forward in the
bytestream and work backwards through the bytestream while decoding in
the forward direction on a particular line, then jump forward again in
the bytestream when decoding the next line.
The 32 bits of the doubleword represent the following values:
bit 31: unused
bits 30-26: V delta index
bits 25-21: U delta index
bits 20-16: Y3 delta index
bit 15: unused
bits 14-10: Y2 delta index
bits 9-5: Y1 delta index
bits 4-0: Y0 delta index
Each delta index value is used to index into this table and the
referenced value is added to the previous element on the same plane,
either Y, U, or V:
const int xl_delta_table[32] = {
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 12, 15, 20, 25, 34, 46,
64, 82, 94, 103, 108, 113, 116, 119,
120, 121, 122, 123, 124, 125, 126, 127
};
Remember that the YUV components only have 7 bits of precision. Thus,
the second half of the table values all count as negative values.
At the beginning of a line, the Y0, U, and V delta indices actually
represent the top 5 bits of the absolute 7-bit component value.
The final, concise decoding algorithm operates as follows:
foreach line in image
foreach 32-bit doubleword, working from right -> left in bytestream
load doubleword as little-endian number, swap 16-bit words
if this is the first pixel group in line
next Y value = (Y0 delta index) << 2
next U value = (U delta index) << 2
next V value = (V delta index) << 2
else
next Y value = last Y value + xl_delta_table[Y0 delta index]
next U value = last U value + xl_delta_table[U delta index]
next V value = last V value + xl_delta_table[V delta index]
next Y value = last Y value + xl_delta_table[Y1 delta index]
next Y value = last Y value + xl_delta_table[Y2 delta index]
next Y value = last Y value + xl_delta_table[Y3 delta index]
Since the components only have 7 bits of meaningful precision, it will
likely be necessary to shift each of the components left once more to
achieve 8 bits of output precision.
References
----------
Multimedia Technology Basics (with an introduction to YUV)
http://www.multimedia.cx/mmbasics.txt
ffmpeg project
http://ffmpeg.sourceforge.net/
Creative YUV Format
http://www.csse.monash.edu.au/~timf/videocodec/cyuv.txt
ChangeLog
---------
v1.1: December 3, 2004
- Miro/Pinnacle Video XL (VIXL/PIXL)
v1.0: September 23, 2004
- initial release
GNU Free Documentation License
------------------------------
see http://www.gnu.org/licenses/fdl.html