Description of the Gremlin Digital Video (GDV) Format by Mike Melanson (mike at multimedia.cx) and Vladimir "VAG" Gneushev (vagsoft at mail.ru) v0.3: November 27, 2005 ======================================================================= NOTE: The information in this document is now maintained in Wiki format at: http://wiki.multimedia.cx/index.php?title=Gremlin_Digital_Video ======================================================================= Copyright (c) 2005 Mike Melanson & Vladimir Gneushev Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". Contents -------- * Introduction * File Format * Video Coding Method 2 * Video Coding Method 5 * Bit Reading Procedure For Video Coding Methods 6 and 8 * Video Coding Method 6 * Video Coding Method 8 * Audio Format * Appendix A: Games Using GDV * References * ChangeLog * GNU Free Documentation License Introduction ------------ GDV is the file extension of a multimedia file format used in a number of CD-ROM computer games developed by a company named Gremlin Interactive Ltd. The extension stands for Gremlin Digital Video. The format is most notable for its use in the title "Realms of the Haunting" (see references). See Appendix A for a partial list of games that use the GDV format. The file format is capable of transporting palettized 8-bit video, or 15-, 16-, or 24-bit data. The audio format is 8- or 16-bit PCM or DPCM. File Format ----------- All multi-byte numbers are stored in little endian format. The general file format is laid out as follows: GDV header initial palette (only for 8-bit video data) frame 0 frame 1 ... Each frame has the following structure: sound samples (if sound is present) frame header video data The GDV header has the following structure: bytes 0-3 magic number/file signature (should be 0x94 0x19 0x11 0x29) bytes 4-5 size ID bytes 6-7 number of frames in file bytes 8-9 framerate (frames/second) bytes 10-11 sound flags bit 3 packed data (1 = DPCM, 0 = PCM) bit 2 sample width (1 = 16-bit, 0 = 8-bit) bit 1 channels (1 = stereo, 0 = mono) bit 0 audio present (1 = file has audio, 0 = silence) bytes 12-13 sound playback frequency bytes 14-15 image type bits 2-0 video depth: 1 = 8 bits/pixel (palettized) 2 = 16 bits/pixel 3 = 24 bits/pixel bytes 16-17 frame size (maximum compressed frame size) byte 18 unknown byte 19 lossiness bytes 20-21 frame width bytes 22-23 frame height A frame header has the following structure: bytes 0-1 magic number/frame signatures (should be 0x05 0x13) bytes 2-3 total size of frame bytes 4-7 frame type and flags bits 31-8 number of bytes to skip before encoded video data (applies to coding methods 5, 6, and 8) bit 6 keyframe (1 = intraframe) bits 3-0 frame coding method Thus far, only details of the 8-bit compression format have been determined. A frame's header indicates the coding method used. These are the known coding methods for 8-bit data: 0: uncompressed frame 1: new palette 2: basic LZ-like unpacking 3: frame unchanged from the previous frame 5: advanced version - mixed with RLE 6: almost complicated version, tag/length/offset bits packed together 8: most complicated version, mix of everything possible Video Coding Method 2 --------------------- Coding method 2 embodies a basic LZ-like scheme. To decode the encoded bytestream, begin by reading the first byte. This byte is a set of 4 2-bit instruction tags laid out as: bits 76 54 32 10 aa bb cc dd * If aa is 0 then paint a single pixel by copying the next byte from the encoded bytestream into the decoded image. * If aa is 1 then copy a run of pixels from the area of the image that has already been painted. First, the source offset and run length must be decoded from the bytestream. For the next 2 bytes in the bytestream, byte_a followed by byte_b: byte_a byte_b 76543210 76543210 The length of the run to be copied is defined as 3 more than bits 3-0 of byte_a. In C notation, this is expressed as: length = (byte_a & 0x0F) + 3; The beginning run offset is defined as the 12-bit quantity specified by the top 4 bits of byte_a combined with byte_b. In C notation, this is expressed as: offset = ((byte_a & 0xF0) << 4) | byte_b; The starting offset from which to copy is defined as the current offset in the output image minus the quantity (4096 - offset). * If aa is 2 then the next pixels in the decoded frame are unchanged from the previous frame. The length of the unchanged pixel run is defined by the next byte in the encoded bytestream, plus 2. This gives the range of 2..257 unchanged pixels. * If aa is 3 then the frame decode operation is finished. Presumably, a decoder should also stop decoding when it runs out of bytes in the encoded bytestream buffer. After tag aa is decoded, decode tag bb using the same process as tag aa, then tag cc, followed by tag dd. After decoding tag dd, fetch the next byte from the encoded bytestream as the next tag byte and repeat the decoding process until a tag of 3 is encountered or until the encoded bytestream buffer is exhausted. Video Coding Method 5 --------------------- Coding method 5 is similar to method 2 but mixes in some run length encoding. To reach the start of the encoded bytestream, first skip n bytes after the frame header, where n is defined in the frame header. To decode the encoded bytestream, begin by reading the first byte. This byte is a set of 4 2-bit instruction tags laid out as: bits 76 54 32 10 aa bb cc dd * If aa is 0 then paint a single pixel by copying the next byte from the encoded bytestream into the decoded image. * If aa is 1 then either copy a run of pixels from the area of the image that has already been painted, or fill a run of pixels with a constant pixel. First, the source offset and run length must be decoded from the bytestream. For the next 2 bytes in the bytestream, byte_a followed by byte_b: byte_a byte_b 76543210 76543210 The length of the run to be copied is defined as 3 more than bits 3-0 of byte_a. In C notation, this is expressed as: length = (byte_a & 0x0F) + 3; The beginning run offset is defined as the 12-bit quantity specified by the top 4 bits of byte_a combined with byte_b. In C notation, this is expressed as: offset = ((byte_a & 0xF0) << 4) | byte_b; If the decoded offset is 0xFFF then take the last decoded pixel in the output frame and copy it into the next (length) pixels in the output frame. If the decoded offset is not 0xFFF then copy (length) pixels into the current offset from the output image starting from the current offset minus the quantity (4096 - offset). * If aa is 2 then either the next pixels in the decoded frame are unchanged from the previous frame, or the frame decode is finished. Decode the next byte from the bytestream as the length. If the length is 0 then the frame decode is finished. If the length is 0xFF then decode the next 16-bit value from the bytestream as the length. This length indicates the number of pixels from the current offset in the decoded frame that remain unchanged from the previous frame. * If aa is 3 then either copy a run of pixels from the area of the image that has already been painted, or fill a run of pixels with a constant pixel. First, the source offset and run length must be decoded from the bytestream. For the next byte in the bytestream: byte 76543210 Bits 7-2 define the 6-bit offset. Bits 1-0 plus 2 define the length. In C notation, this is expressed as: offset = byte >> 2; length = (byte & 0x03) + 2; If the decoded offset is 0 then take the last decoded pixel in the output frame and copy it into the next (length) pixels in the output frame. If the decoded offset is not 0 then copy (length) pixels into the current offset from the output image starting from the current offset minus the quantity (offset - 1). After tag aa is decoded, decode tag bb using the same process as tag aa, then tag cc, followed by tag dd. After decoding tag dd, fetch the next byte from the encoded bytestream as the next tag byte and repeat the decoding process until a tag of 2 is encountered with an associated length of 0, or until the encoded bytestream buffer is exhausted. Bit Reading Procedure For Video Coding Methods 6 and 8 ------------------------------------------------------ Video coding methods 6 and 8 treat the encoded bytestream as a sequence of packed bits and bytes. The best way to illustrate the method is to jump in with an example bytestream: 0x2D 0xAA 0x5A 0x7F 0x26 0x53 0xB1 ... The bit reader maintains a 32-bit bit queue and a queue size (qsize). Initialize the queue with the first 4 bytes in the bytestream interpreted as a little endian 32-bit number, and initialize the queue size to 16: queue = 0x7F5AAA2D qsize = 16 Reading bits entails reading the least significant bits from the queue. Reading 4 bits in this example will yield 0xD. Afterwards, the bit reading state variables will be: queue = 0x07F5AAA2 qsize = 12 As an example, assume the coding mode dictates that the next 3 bits shall be read (010 = 2) followed by the next 1 bit (0) and these codes indicate that the decoder should read the next byte from the encoded bytestream. This next byte is 0x26 in this example. The state variables after reading the next (3 + 1 = 4) bits will be: queue = 0x007F5AAA qsize = 8 Assume the next decode operation is to read 16 bits from the stream (0x5AAA). The state variables are now: queue = 0x0000007F qsize = -8 Since qsize is less than or equal to 0, fetch the next 16-bit value from the encoded bytestream and logically or it to the left of the remaining bits in the queue. Then add 16 to the qsize: queue = 0x00B1537F qsize = 8 The descriptions of video coding method 6 and 8 will use the phrase "read the next n bits from the bit queue." This indicates that the next n bits should be shifted off of the rightmost part of the bit queue, the qsize should be decreased by n, and if qsize is less than or equal to 0 refresh the bit queue and increase qsize as described previously. Video Coding Method 6 --------------------- Coding method 6 embodies similar techniques as coding modes 2 and 5. The most significant is that the bytestream is decoded as described in the previous section. To reach the start of the encoded bitstream, first skip n bytes after the frame header, where n is defined in the frame header. Initialize the bit queue at that point. To decode the frame, read the next 2 bits from the bit queue as the instruction tag. * If the tag is 0 then read the next bit from the bit queue. If the bit is 0 then copy the next byte in the bytestream into the output frame as the next pixel. If the bit is 1 then copy a series of pixels from the encoded bytestream into the output frame. The length of the pixel run to copy is obtained by the following process: length = 2 count = 0 do count++ step = read (count) bits from bit queue length = length + step while (step == ((1 << count) - 1)) * If the tag is 1 then the next series of pixels in the output frame are unchanged from the previous frame. To determine precisely how many pixels are unchanged, read the next bit from the bit queue. If the bit is 0 then read the next 4 bits from the bit queue. These 4 bits plus 2 represent the number of pixels that are unchanged, which is in the range of 2..17. If the bit is 1 then read the next byte from the bytestream as the length. If the top bit of the length is 0 then the actual length of the unchanged pixel run is length + 18 which yields a range of 18..145 pixels. If the top bit of the decoded length byte is 1 then read the next byte from the bytestream and perform the following calculation: length = (((length & 0x7F) << 8) | next_byte) + 146; * If the tag is 2 read the next 2 bits from the bit queue as the sub-tag. If the sub-tag is 3 then either copy a run of pixels from the portion of the image that has already been decoded into the current offset, or fill a run of pixels with a constant pixel value. Read the next byte from the bytestream as the offset. If the most significant bit of the offset byte (bit 7) is set then the length of the next pixel operation is 3; othewise, the length is 2. Next clear bit 7 of the offset. If offset is 0 then take the most recent pixel in the decoded frame and fill the next (length) pixels with that value. If offset is non-zero then copy (length) pixels from the current offset - (offset - 1) from the decoded image to the current offset. If the sub-tag is not 3 then read the next 4 bits from the bit queue. These bits comprise bits 11-8 of a 12-bit offset quantity. The bottom 8 bits of the offset quantity come from the next byte read from the bytestream. If the sub-tag is 0 and the offset is 0xFFF then the frame decode operation is complete. If the sub-tag is 0 and the offset is greater than 0xF80 then a pair of pixels from the portion of the output image already decoded and place the pair into the output frame a specified number of times. The length of the pixel run is 2 more than the bottom 4 bits of the 12-bit offset quantity. The actual offset is defined as bits 6-4 of the 12-bit offset quantity. In C notation length and offset are computed from offset as: length = (offset & 0x00F) + 2; offset = (offset >> 4) & 7; The pair of pixels are retrieved from the decoded image at the current offset - (offset - 1). If the sub-tag is not 0 or the offset is less than or equal to 0xF80 then add 3 to the length. If the offset is 0xFFF, take the last pixel output into the decoded image and copy it (length) times into the decoded image at the current offset. If the offset is not equal to 0xFFF then copy a run of pixels from the area of the image that has already been painted. The starting offset from which to copy is defined as the current offset in the output image minus the quantity (4096 - offset). * If the tag is 3 then either copy a run of pixels from the area of the image that has already been painted, or fill a run of pixels with a constant pixel. Read the next byte in the bytestream as the offset. The length is the top 4 bits of this byte (bits 7-4). If the length is 15 then read the next byte from the bytestream and add it to the length. Add 6 more to the length. Read the next byte from the bytestream and make it the bottom 8 bits of the offset quantity. If the offset is 0xFFF then take the previous pixel from the decoded image and copy it into the decoded image for (length) iterations. If offset is other than 0xFFF then move length pixels into the decoded image at the current offset from the current offset + (offset - 4096). Video Coding Method 8 --------------------- Video coding method 8 is precisely the same as video coding method 6 except for the procedure for decoding tag 3. To decode tag 3, decode the next byte in the bytestream (first_byte). If the top 2 bits of first_byte are set then the required operation is to copy a run of pixels (from the previous frame?) to the current offset of the current frame. The length of the run is denoted by the bottom 6 bits of first_byte plus 8. The 12-bit offset quantity is denoted by the next 4 bits is the bit queue (top 4 bits of the quantity) combined with the next byte from the bytestream (bottom 8 bits). Move (length) bytes from the current offset plus the offset quantity + 1 into the current offset. If the top 2 bits of first_byte are not set then the required operation is to either copy a run the previous pixels to the current position or repeat the previous pixel for a number of pixels. If the top bit of first_byte is 0 then the length of the pixel run is defined as the quantity 6 plus bits 6-4 of first_byte (this yields a range of 6..13). The offset is defined as the 12-bit quantity by combining bits 3-0 (top 4 bits of quantity) and the next byte in the bytestream (bottom 8 bits). If the top bit of first_byte is set then the length of the pixel run is defined as the quantity 14 plus bits 5-0 of first_byte (this yields a range of 14..77). The offset is defined as the 12-bit quantity by combining the next 4 bits from the bit queue (top 4 bits of quantity) and the next byte from the bytestream (bottom 8 bits). If the offset is 0xFFF then take the previous pixel from the decoded image and copy it into the decoded image for (length) iterations. If offset is other than 0xFFF then move length pixels into the decoded image at the current offset from the current offset + (offset - 4096). Audio Format ------------ [document DPCM format] Appendix A: Games Using GDV --------------------------- These games are known to use the GDV file format for their FMV: * Hardwar * Normality * Realms of the Haunting References ---------- Realms of the Haunting: http://www.mobygames.com/game/dos/realms-of-the-haunting Unofficial fansite: http://www.realmsofthehaunting.com/ ChangeLog --------- v0.3: November 27, 2005 - documented the remaining video coding formats - held in pre-1.0 state until a decoder is successfully implemented based on this information v0.2: November 22, 2005 - documented 2.5/4 video coding formats v0.1: November 9, 2005 - initial release - describes general file format GNU Free Documentation License ------------------------------ see http://www.gnu.org/licenses/fdl.html