Description of the HuffYUV (HFYU) Codec by Roberto Togni (rtogni at bresciaonline dot it) v0.8: March 24, 2003 ======================================================================= NOTE: The information in this document is now maintained in Wiki format at: http://wiki.multimedia.cx/index.php?title=HuffYUV ======================================================================= Copyright (c) 2002-2003 Roberto Togni Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". Huffyuv v2.1.1, and previous version are written by Ben Rudiak-Gould. http://www.math.berkeley.edu/~benrg/huffyuv.html Introduction ============ HuffYUV is a lossless video compressor for AVI files written by Ben Rudiak-Gould. It's distributed in form of a vfw (Video for Windows) dll. Source code is available and distributed (mostly) under GPL. This codec was created to temporary store video data coming from a capture card for later editing, so its main features are: - speed: uses simple predictors and original version is heavily optimized. - quality: it's lossless, so image is not degraded if read/written many times. - editable: every frame is a key frame, there is no temporal prediction. Tech notes from author's page (referring to v 2.1.1) ==================================================== Huffyuv's algorithm is roughly the same as lossless JPEG: it predicts each sample and Huffman-encodes the error. The predictor functions are: "left" (predicts the previous sample from the same channel) "gradient" (predicts Left+Above-AboveLeft) "median" (predicts the median of Left, Above, and the gradient predictor). Channels are compressed separately, but in RGB mode the channels used are actually R-G, G, and B-G. This yields much better compression than R, G, B. The error signal in each channel is encoded with its own Huffman table. On compression Huffyuv picks appropriate tables from its built-in collection. These tables are then stored in the output file and used when decompressing. This way future versions of Huffyuv can decompress old files without my having to explicitly support old tables. A Huffyuv-savvy application can also specify the Huffman tables to be used for compression instead of accepting the defaults. File Header =========== All informations about the image format and predictor used are stored into the BITMAPINFOHEADER structure of avi header. BITMAPINFOHEADER layout (only interesting fields are shown) ----------------------------------------------------------- HUFFYUV_BITMAPINFOHEADER { int biSize [...] short biBitCount int biCompression [...] Extradata { BYTE method BYTE bpp_override BYTE unused[2] } BYTE Compressed_Huffmantables[] } Extradata and Compressed_Huffmantables fields are not always there. biCompression contains "HFYU" fourcc. biSize is the size of the whole structure, and it's used to find out if extra fields are there, since size of standard BITMAPINFOHEADER is known. To find out image format you need to get true bit count: it is stored in bpp_override if biSize > sizeof(BITMAPINFOHEADER), else (or if bpp_override is 0) it's stored in biBitCount. Then you can find out image format with this table +-----+-------+ | bit | image | +-----+-------+ | 16 | YUY2 | | 24 | RGB | | 32 | RGBA | +-----+-------+ In the rest of this document YUV will indicate image of type YUY2, and RGB will be used for both RGB and RGBA. To find out the prediction method you have to look at the 3 LSbit of biBitCount (biBitCount & 0x07) and use this table +-----+-------------+ | bit | method | +-----+-------------+ | 000 | see below | | 001 | left | | 010 | left decorr | | 011 | gradient * | | 100 | median | +-----+-------------+ *gradient if YUV, gradient with decorrelation if RGB. If (biBitCount & 0x07) is 0, method is stored in extradata if biSize > sizeof(BITMAPINFOHEADER), else method is predict_old (the method used in version 1.x, it's the same as predict left). To find out method from extradata method byte use this table +------+-----------------+ | byte | method | +------+-----------------+ | -2 | old | | 0 | left | | 1 | gradiend | | 2 | median | | 64 | left decorr | | 65 | gradient decorr | +------+-----------------+ Huffman tables can be stored in file (v2.x) or not (v1.x). If biSize = sizeof(BITMAPINFOHEADER) tables are not stored into file, and to decode data you have to use classic tables (known by the decoder) choosing RGB or YUV as appropriate. Else tables are stored in file: if (biBitCount & 0x07) = 0 tables start at the end of BITMAPINFOHEADER, else they start after Extradata. Huffman tables -------------- Huffman tables are stored in file compressed with a run-length algorithm. This is what Ben Rudiak-Gould says about it in source code (tables.cpp) // The Huffman tables are run-length encoded. (I could have Huffman- // encoded them, but the madness has to end somewhere.) The // decompression algorithm is as follows: Read a byte. The lower five // bits are the value to repeat. If the upper three bits are zero, the // repeat count is in the next byte; otherwise, the upper three bits are // the repeat count. The tables are zero-terminated so that I can use // string functions on them (a zero byte can never appear in the RLE // data). // Each array actually contains three Huffman tables, one for each sample // type (e.g. Y,U,V, or R-G,G,B-G). Each table expands to 256 bytes, and // each byte is a code length. The codes themselves are determined at // run time and are not stored here--except for the "classic" tables, // which don't fit the code-choosing algorithm. When the 3 tables are available (either extracted and decompressed, or known because the file was compressed with v1.x of the codec) you can extract data from the encoded byte stream. Tables are stored in file in this order: YUV: Y_table, U_table, V_table RGB: B_table, G_table, R_table RGB with decorrelation: B-G_table, G_table, R-G_table Frame decompression =================== General info ------------ Frames are stored in avi file one per chunk. All frames in file are marked as key frame. You only need data from current chunk and file header to decompress a frame (no knowledge of previous frames or frame position in file is needed). Operating algorithm ------------------- HuffYUV codec compress image by predicting the value of a pixel from it's neighbourgs, and computes an error value (delta) by subtracting it from the effective pixel value. The result of this operation is then compressed with a Huffman algorithm, using a different table for each channel. To decompress a frame you need to reverse this process: at first extract the error value from the compressed Huffman stream, then you can reconstruct the pixel value by computing the predictor value and adding it to the error value. To make this process possible you need a start value (the first pixel in frame) and a predictor that uses only values from past pixel (already decompressed). Obviously all predictors used in HuffYUV satisfy this requirement. All pixel component values must be treated as unsigned bytes. In all computations overflow or underflow causes a wrap-around (no saturation). If decorrelation is used (RGB only), instead of compressing R, G, B components the codec compresses R-G, G, B-G (this gives a better compression). To reconstruct B and R values you have to add them the value of G taken from the same pixel. Bit stream ---------- The components of the image are compressed with different Huffman tables, and then the bit streams are interleaved. As an example suppose to have 3 components (the values represent the difference between the pixel value and the predictor output). The byte sequence [ABC][DEF][GHI][LMN] is compressed in this way: - Compress each byte of the first triplet [ABC] using the Huffan table associated with its component. You will get 3 codes of variable length. Suppose you get aaaa by compression of A, bbbbbbb from B and ccc from C. The output bit stream will be aaaabbbbbbccc. - Repeat it for every triplet, adding the new bits at the end of the stream. If compression of [DEF] will give dddd eee fffff, the output stream after second pixel will be aaaabbbbbbcccddddeeefffff. To decompress this sequence you keep extracting bits from the stream until you get a valid code (the table contains all valid codes), and then you decode it using the table associated with that component. In the example above you will get the 4 bit aaaa and decode them to A. After decoding the current component you need to remove decoded bits from the stream (it will become bbbbbbcccddddeeefffff), and start decoding the next component. When the first triplet [ABC] si decoded the stream will be ddddeeefffff, ant you're ready to start again with the first component of the next pixel. Data order ---------- This is how data is stored into encoded stream. The meaning of the stream format line is: a capital letter is used to indicate an uncompressed value 1 byte long, small letters represent Huffman encoded values and their length in bit depend on the size of the code word (after decompression all delta values are 1byte long). YUV case Image is stored left to right, top to bottom. Each pair of pixel is represented by 4 bytes, using YUY2 format. Frame width is always even. Stream format: Y U Y V y u y v y u y v .... where YUYV are the first two pixels, and y, u, v are the error values. y is decompressed with Y_table, u with U_table and v with V_table. RGB case Image is stored left to right, bottom to top. Each pixel is represented by 3 bytes, using BGR24 format. Stream format: X B G R b g r b g r .... where X is unused, BGR is the first pixel, and b, g, r are the error values. b is decompressed with B_table, g with G_table and r with R_table. RGB with decorrelation case Image is stored left to right, bottom to top. Each pixel is represented by 3 bytes, using BGR24 format. Stream format: X B G R g bg rg g bg rg .... where X is unused, BGR is the first pixel, and g, bg, rg are the error values. g is decompressed with G_table, bg with B-G_table and rg with R-G_table. RGBA case Image is stored left to right, bottom to top. Each pixel is represented by 4 bytes, using BGR24 format plus alpha channel. Stream format: B G R A b g r a b g r a .... where BGRA is the first pixel, and b, g, r, a are the error values. b is decompressed with B_table, g with G_table, r with R_table and a with R_table. RGBA with decorrelation case Image is stored left to right, bottom to top. Each pixel is represented by 4 bytes, using BGR24 format plus alpha channel. Stream format: B G R A g bg rg a g bg rg a .... where BGRA is the first pixel, and g, bg, rg, a are the error values. g is decompressed with G_table, bg with B-G_table, br with R-G_table and a with R-G_table. Predictors ========== Predictors are used to guess the value of the next pixel, so the codec needs to store only the error between the real pixel value and the prediction. If predictor is good the error will be small, and will compress better than pixel values. Valid predictors are: YUV: old, left, gradient, median. RGB: old, left, left with decorrelation, gradient with decorrelation. Other combinations (es. RGB gradient without decorrelation), while technically feasible, are not used. While describing predictors, the following symbols are used: [A B C]: components of a pixel Ayz: component A of pixel at row y and column z a, b, c: indexes used for columns n, m: indexes used for rows x: index used to indicate the last column of a frame Predict left, predict left with decorrelation and predict old ------------------------------------------------------------- Predict old is the same as predict left (without decorrelation for RGB case). The predictor for a pixel component is the same component from previous pixel (previous pair for U, V in YUV case). Some examples: - random pixel inside a row, RGB ...... [Ban Gan Ran] [Bam Gam Ram] ...... The predictor for Gam is Gan - random pixel inside a row, YUV .......[Y1an Uan Y2an Van] [Y1am Uam Y2am Vam] ....... The predictor for Uam is Uan, for Y1am is Y2an, for Y2am is Y1am -first pixel in a row (not for first row), RGB [Ba1 Ga1 Ra1] ........ [Bax Gax Rax] [Bb1 Gb1 Rb1] ...... The predictor for Rb1 is Rax -first pixel in a row (not for first row), YUV [Y1a1 Ua1 Y2a1 Va1] ........ [Y1ax Uax Y2ax Vax] [Y1b1 Ub1 Y2b1 Vb1] ...... The predictor for Ub1 is Uax, for Y2b1 is Y1b1, for Y1b1 is Y2ax The first pixel of the picture is stored uncompressed, the rest of the frame is compressed with predict left algorithm. Predict gradient and predict gradient with decorrelation -------------------------------------------------------- The predictor for a pixel is given by the sum of the previous pixel (like predict left) and the above pixel (the pixel from the row above at the same column), minus the above left pixel (the pixel from the row above at the previous column). Some examples: - random pixel inside a row, RGB ...... [Ban Gan Ran] [Bam Gam Ram] ...... .......[Bbn Gbn Rbn] [Bbm Gbm Rbm] The predictor for Gbm is (Gbn + Gam - Gan) - random pixel inside a row, YUV .......[Y1an Uan Y2an Van] [Y1am Uam Y2am Vam] ....... .......[Y1bn Ubn Y2bn Vbn] [Y1bm Ubm Y2bm Vbm] ....... The predictor for Ubm is (Ubn + Uam - Uan) The predictor for Y1bm is (Y2bn + Y1am - Y2an) The predictor for Y2bm is (Y1bm + Y2am - Y1am) -first pixel in a row (not for first row), RGB [Ba1 Ga1 R11] ........ [Bax Gax Rax] [Bb1 Gb1 Rb1] ........ [Bbx Gbx Rbx] [Bc1 Gc1 Rc1] ...... The predictor for Rc1 is (Rbx + Rb1 - Rax) -first pixel in a row (not for first row), YUV [Y1a1 Ua1 Y2a1 Va1] ........ [Y1ax Uax Y2ax Vax] [Y1b1 Ub1 Y2b1 Vb1] ........ [Y1bx Ubx Y2bx Vbx] [Y1c1 Uc1 Y2c1 Vc1] ...... The predictor for Uc1 is (Ubx + Ub1 - Uax) The predictor for Y2c1 is (Y1c1 + Y2b1 - Y1b1) The predictor for Y1c1 is (Y2bx + Y1b1 - Y2ax) The first pixel of the picture is stored uncompressed, the remaining of the first row is compressed with predict left algorithm. Other rows are compressed with predict gradient. The above left value for the first pixel of the second row is 0. Maybe other pixels of the second row should ignore above left values, but I'm not sure about it. Predict median (YUV only) -------------------------------------------------------- The predictor for a pixel is given by the median value among the left pixel, the pixel above and the gradient predictor (sum of the previous pixel and the above pixel, minus the above left pixel). The median is obtained by sorting the three values and taking the one in the middle, and have nothing to do with the mean value of the three numbers. As an example if the left pixel is 0x12, the above one is 0x4e and the gradient predictor is 0xaf, the median is 0x4e. As Ben Rudiak-Gould says on his page, there is no technical reason for this method to be YUV only, he simply didn't implement it for RGB case. Some examples: - random pixel inside a row, YUV .......[Y1an Uan Y2an Van] [Y1am Uam Y2am Vam] ....... .......[Y1bn Ubn Y2bn Vbn] [Y1bm Ubm Y2bm Vbm] ....... The predictor for Ubm is the median value of Ubn, Uam and (Ubn + Uam - Uan) The predictor for Y1bm is the median value of Y2bn, Y1am and (Y2bn + Y1am - Y2an) The predictor for Y2bm is the median value of Y1bm, Y2am and (Y1bm + Y2am - Y1am) -first pixel in a row (not for first row), YUV [Y1a1 Ua1 Y2a1 Va1] ........ [Y1ax Uax Y2ax Vax] [Y1b1 Ub1 Y2b1 Vb1] ........ [Y1bx Ubx Y2bx Vbx] [Y1c1 Uc1 Y2c1 Vc1] ...... The predictor for Uc1 is the median value of Ubx, Ub1 and (Ubx + Ub1 - Uax) The predictor for Y2c1 is the median value of Y1c1, Y2b1 and (Y1c1 + Y2b1 - Y1b1) The predictor for Y1c1 is the median value of Y2bx, Y1b1 and (Y2bx + Y1b1 - Y2ax) The first pixel of the picture is stored uncompressed. The remaining of the first row, and the first 4 pixel (2 pairs, 8 bytes) of the second row are compressed with predict left algorithm. The rest of the second row and other rows are compressed with predict median. Technically the second pixel of the second row could be compressed using median predictor; it's compressed with left predictor because original code uses MMX instructions and so handles 8 bytes a time. Interlaced images ================= If image height is greater than 288 pixel, image is treated as interlaced. You have to build an image with double width and half height, obtaindeb by placing the two fields side by side, with odd field on the left. The frame size reported in the file header is unchanged, the only way to find out interlacing is looking at image height and compare with 288. As an example, suppose your original frame is 1111111111111111 2222222222222222 3333333333333333 4444444444444444 5555555555555555 6666666666666666 7777777777777777 8888888888888888 then the frame to be compressed will be 11111111111111112222222222222222 33333333333333334444444444444444 55555555555555556666666666666666 77777777777777778888888888888888 In this way every line on a field gets the values for the predictors from the previous line of the same field, instead of using the previous line of the frame. Please note that it makes ad ifference only for prediction methods that uses values from the previous line (median and gradient algorithms): if a method refers only to the value of the previous pixel (left algorithms) interlacing can be ignored. Final notes =========== Frame width must be divisible by 4 (at least for the original codec). Probably header parsing logic can be simplified a lot, since not every combination of options is valid. This is what I was able to create using dll v2.1.1 and 1.2.3/1.3.1 and VirtualDub under Windows: - Old format (v1.x only) is only used with classical tables (not stored in file) - Other methods (v2.1.1) always store tables in file, and uses extradata to store method. The first two(?) pixels (or 4 bytes) of the second row in predict gradient could be stored in a different way, using only left and above pixel as predictor. None of my samples require this. References ========== HuffYUV home page Huffyuv v2.1.1, by Ben Rudiak-Gould. http://www.math.berkeley.edu/~benrg/huffyuv.html ChangeLog ========= v0.8: March 24, 2003 - licensed under GNU Free Documentation License v0.7: June 16, 2002 - Added a section about interlaced images v0.6: March 27, 2002 - Fixed typos, added some clarification on symbol used - Added a small explanation of Huffman decompression v0.5: March 25, 2002 - initial release GNU Free Documentation License ============================== see http://www.gnu.org/licenses/fdl.html EOF