Description of the Gremlin Digital Video (GDV) Format
by Mike Melanson (mike at multimedia.cx)
and Vladimir "VAG" Gneushev (vagsoft at mail.ru)
v0.3: November 27, 2005


=======================================================================
NOTE: The information in this document is now maintained in Wiki format
at:
  http://wiki.multimedia.cx/index.php?title=Gremlin_Digital_Video
=======================================================================


  Copyright (c) 2005 Mike Melanson & Vladimir Gneushev
  Permission is granted to copy, distribute and/or modify this document
  under the terms of the GNU Free Documentation License, Version 1.2
  or any later version published by the Free Software Foundation;
  with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
  A copy of the license is included in the section entitled "GNU
  Free Documentation License".


Contents
--------
 * Introduction
 * File Format
 * Video Coding Method 2
 * Video Coding Method 5
 * Bit Reading Procedure For Video Coding Methods 6 and 8
 * Video Coding Method 6
 * Video Coding Method 8
 * Audio Format
 * Appendix A: Games Using GDV
 * References
 * ChangeLog
 * GNU Free Documentation License


Introduction
------------
GDV is the file extension of a multimedia file format used in a number
of CD-ROM computer games developed by a company named Gremlin Interactive
Ltd. The extension stands for Gremlin Digital Video. The format is most 
notable for its use in the title "Realms of the Haunting" (see references).
See Appendix A for a partial list of games that use the GDV format.

The file format is capable of transporting palettized 8-bit video, or 15-,
16-, or 24-bit data. The audio format is 8- or 16-bit PCM or DPCM.


File Format
-----------
All multi-byte numbers are stored in little endian format.

The general file format is laid out as follows:

  GDV header
  initial palette (only for 8-bit video data)
  frame 0
  frame 1
  ...

Each frame has the following structure:

  sound samples (if sound is present)
  frame header
  video data

The GDV header has the following structure:

  bytes 0-3    magic number/file signature (should be 0x94 0x19 0x11 0x29)
  bytes 4-5    size ID
  bytes 6-7    number of frames in file
  bytes 8-9    framerate (frames/second)
  bytes 10-11  sound flags
    bit 3      packed data (1 = DPCM, 0 = PCM)
    bit 2      sample width (1 = 16-bit, 0 = 8-bit)
    bit 1      channels (1 = stereo, 0 = mono)
    bit 0      audio present (1 = file has audio, 0 = silence)
  bytes 12-13  sound playback frequency
  bytes 14-15  image type
    bits 2-0   video depth:
               1 = 8 bits/pixel (palettized)
               2 = 16 bits/pixel
               3 = 24 bits/pixel
  bytes 16-17  frame size (maximum compressed frame size)
  byte 18      unknown
  byte 19      lossiness
  bytes 20-21  frame width
  bytes 22-23  frame height

A frame header has the following structure:

  bytes 0-1    magic number/frame signatures (should be 0x05 0x13)
  bytes 2-3    total size of frame
  bytes 4-7    frame type and flags
    bits 31-8  number of bytes to skip before encoded video data
               (applies to coding methods 5, 6, and 8)
    bit 6      keyframe (1 = intraframe)
    bits 3-0   frame coding method

Thus far, only details of the 8-bit compression format have been
determined. A frame's header indicates the coding method used. These are
the known coding methods for 8-bit data:

  0: uncompressed frame
  1: new palette
  2: basic LZ-like unpacking
  3: frame unchanged from the previous frame
  5: advanced version - mixed with RLE
  6: almost complicated version, tag/length/offset bits packed together
  8: most complicated version, mix of everything possible


Video Coding Method 2
---------------------
Coding method 2 embodies a basic LZ-like scheme. To decode the encoded
bytestream, begin by reading the first byte. This byte is a set of 4 2-bit
instruction tags laid out as:

  bits 76 54 32 10
       aa bb cc dd

* If aa is 0 then paint a single pixel by copying the next byte from the
encoded bytestream into the decoded image.

* If aa is 1 then copy a run of pixels from the area of the image that 
has already been painted. First, the source offset and run length must be
decoded from the bytestream. For the next 2 bytes in the bytestream,
byte_a followed by byte_b:

   byte_a    byte_b
  76543210  76543210

The length of the run to be copied is defined as 3 more than bits 3-0 of
byte_a. In C notation, this is expressed as:

  length = (byte_a & 0x0F) + 3;

The beginning run offset is defined as the 12-bit quantity specified by
the top 4 bits of byte_a combined with byte_b. In C notation, this is
expressed as:

  offset = ((byte_a & 0xF0) << 4) | byte_b;
  
The starting offset from which to copy is defined as the current offset in
the output image minus the quantity (4096 - offset).

* If aa is 2 then the next pixels in the decoded frame are unchanged from
the previous frame. The length of the unchanged pixel run is defined by
the next byte in the encoded bytestream, plus 2. This gives the range of
2..257 unchanged pixels.

* If aa is 3 then the frame decode operation is finished. Presumably, a
decoder should also stop decoding when it runs out of bytes in the encoded
bytestream buffer.

After tag aa is decoded, decode tag bb using the same process as tag aa,
then tag cc, followed by tag dd. After decoding tag dd, fetch the next
byte from the encoded bytestream as the next tag byte and repeat the
decoding process until a tag of 3 is encountered or until the encoded 
bytestream buffer is exhausted.


Video Coding Method 5
---------------------
Coding method 5 is similar to method 2 but mixes in some run length
encoding. 

To reach the start of the encoded bytestream, first skip n bytes after
the frame header, where n is defined in the frame header.

To decode the encoded bytestream, begin by reading the first 
byte. This byte is a set of 4 2-bit instruction tags laid out as:

  bits 76 54 32 10
       aa bb cc dd

* If aa is 0 then paint a single pixel by copying the next byte from the
encoded bytestream into the decoded image.

* If aa is 1 then either copy a run of pixels from the area of the image
that has already been painted, or fill a run of pixels with a constant
pixel. First, the source offset and run length must be decoded from the
bytestream. For the next 2 bytes in the bytestream, byte_a followed by
byte_b:

   byte_a    byte_b
  76543210  76543210

The length of the run to be copied is defined as 3 more than bits 3-0 of
byte_a. In C notation, this is expressed as:

  length = (byte_a & 0x0F) + 3;

The beginning run offset is defined as the 12-bit quantity specified by
the top 4 bits of byte_a combined with byte_b. In C notation, this is
expressed as:

  offset = ((byte_a & 0xF0) << 4) | byte_b;

If the decoded offset is 0xFFF then take the last decoded pixel in the
output frame and copy it into the next (length) pixels in the output
frame.

If the decoded offset is not 0xFFF then copy (length) pixels into the
current offset from the output image starting from the current offset
minus the quantity (4096 - offset).

* If aa is 2 then either the next pixels in the decoded frame are unchanged
from the previous frame, or the frame decode is finished. Decode the next
byte from the bytestream as the length. If the length is 0 then the frame
decode is finished. If the length is 0xFF then decode the next 16-bit value
from the bytestream as the length. This length indicates the number of
pixels from the current offset in the decoded frame that remain unchanged
from the previous frame.

* If aa is 3 then either copy a run of pixels from the area of the image
that has already been painted, or fill a run of pixels with a constant
pixel. First, the source offset and run length must be decoded from the
bytestream. For the next byte in the bytestream:

    byte
  76543210

Bits 7-2 define the 6-bit offset. Bits 1-0 plus 2 define the length. In C
notation, this is expressed as:

  offset = byte >> 2;
  length = (byte & 0x03) + 2;

If the decoded offset is 0 then take the last decoded pixel in the output
frame and copy it into the next (length) pixels in the output frame. 

If the decoded offset is not 0 then copy (length) pixels into the current
offset from the output image starting from the current offset minus the
quantity (offset - 1).

After tag aa is decoded, decode tag bb using the same process as tag aa,
then tag cc, followed by tag dd. After decoding tag dd, fetch the next
byte from the encoded bytestream as the next tag byte and repeat the
decoding process until a tag of 2 is encountered with an associated length
of 0, or until the encoded bytestream buffer is exhausted.


Bit Reading Procedure For Video Coding Methods 6 and 8
------------------------------------------------------
Video coding methods 6 and 8 treat the encoded bytestream as a sequence of
packed bits and bytes. The best way to illustrate the method is to jump in
with an example bytestream:

  0x2D 0xAA 0x5A 0x7F 0x26 0x53 0xB1 ...

The bit reader maintains a 32-bit bit queue and a queue size (qsize).
Initialize the queue with the first 4 bytes in the bytestream interpreted 
as a little endian 32-bit number, and initialize the queue size to 16:

  queue = 0x7F5AAA2D
  qsize = 16

Reading bits entails reading the least significant bits from the queue.
Reading 4 bits in this example will yield 0xD. Afterwards, the bit reading
state variables will be:

  queue = 0x07F5AAA2
  qsize = 12

As an example, assume the coding mode dictates that the next 3 bits shall
be read (010 = 2) followed by the next 1 bit (0) and these codes indicate
that the decoder should read the next byte from the encoded bytestream.
This next byte is 0x26 in this example. The state variables after reading
the next (3 + 1 = 4) bits will be:

  queue = 0x007F5AAA
  qsize = 8

Assume the next decode operation is to read 16 bits from the stream 
(0x5AAA). The state variables are now:

  queue = 0x0000007F
  qsize = -8

Since qsize is less than or equal to 0, fetch the next 16-bit value from
the encoded bytestream and logically or it to the left of the remaining
bits in the queue. Then add 16 to the qsize:

  queue = 0x00B1537F
  qsize = 8

The descriptions of video coding method 6 and 8 will use the phrase "read
the next n bits from the bit queue." This indicates that the next n bits
should be shifted off of the rightmost part of the bit queue, the qsize
should be decreased by n, and if qsize is less than or equal to 0 refresh
the bit queue and increase qsize as described previously.


Video Coding Method 6
---------------------
Coding method 6 embodies similar techniques as coding modes 2 and 5. The
most significant is that the bytestream is decoded as described in the
previous section.

To reach the start of the encoded bitstream, first skip n bytes after the 
frame header, where n is defined in the frame header. Initialize the bit
queue at that point.

To decode the frame, read the next 2 bits from the bit queue as the
instruction tag.

* If the tag is 0 then read the next bit from the bit queue. If the bit is
0 then copy the next byte in the bytestream into the output frame as the
next pixel. If the bit is 1 then copy a series of pixels from the encoded
bytestream into the output frame. The length of the pixel run to copy is
obtained by the following process:

  length = 2
  count = 0
  do
    count++
    step = read (count) bits from bit queue
    length = length + step
  while (step == ((1 << count) - 1))

* If the tag is 1 then the next series of pixels in the output frame are
unchanged from the previous frame. To determine precisely how many pixels
are unchanged, read the next bit from the bit queue. If the bit is 0 then
read the next 4 bits from the bit queue. These 4 bits plus 2 represent the
number of pixels that are unchanged, which is in the range of 2..17.

If the bit is 1 then read the next byte from the bytestream as the length.
If the top bit of the length is 0 then the actual length of the unchanged
pixel run is length + 18 which yields a range of 18..145 pixels. If the
top bit of the decoded length byte is 1 then read the next byte from the
bytestream and perform the following calculation:

  length = (((length & 0x7F) << 8) | next_byte) + 146;

* If the tag is 2 read the next 2 bits from the bit queue as the sub-tag.

If the sub-tag is 3 then either copy a run of pixels from the portion of
the image that has already been decoded into the current offset, or fill
a run of pixels with a constant pixel value. Read the next byte from the
bytestream as the offset. If the most significant bit of the offset byte
(bit 7) is set then the length of the next pixel operation is 3; othewise,
the length is 2. Next clear bit 7 of the offset. If offset is 0 then take
the most recent pixel in the decoded frame and fill the next (length)
pixels with that value. If offset is non-zero then copy (length) pixels 
from the current offset - (offset - 1) from the decoded image to the
current offset.

If the sub-tag is not 3 then read the next 4 bits from the bit queue. These
bits comprise bits 11-8 of a 12-bit offset quantity. The bottom 8 bits of
the offset quantity come from the next byte read from the bytestream.

If the sub-tag is 0 and the offset is 0xFFF then the frame decode operation
is complete. If the sub-tag is 0 and the offset is greater than 0xF80 then
a pair of pixels from the portion of the output image already decoded and
place the pair into the output frame a specified number of times. The
length of the pixel run is 2 more than the bottom 4 bits of the 12-bit
offset quantity. The actual offset is defined as bits 6-4 of the 12-bit
offset quantity. In C notation length and offset are computed from offset
as:

  length = (offset & 0x00F) + 2;
  offset = (offset >> 4) & 7;

The pair of pixels are retrieved from the decoded image at the current
offset - (offset - 1).

If the sub-tag is not 0 or the offset is less than or equal to 0xF80 then
add 3 to the length. If the offset is 0xFFF, take the last pixel output
into the decoded image and copy it (length) times into the decoded image
at the current offset. If the offset is not equal to 0xFFF then copy a run
of pixels from the area of the image that has already been painted. The
starting offset from which to copy is defined as the current offset in the
output image minus the quantity (4096 - offset).

* If the tag is 3 then either copy a run of pixels from the area of the
image that has already been painted, or fill a run of pixels with a
constant pixel. Read the next byte in the bytestream as the offset. The
length is the top 4 bits of this byte (bits 7-4). If the length is 15 then
read the next byte from the bytestream and add it to the length. Add 6
more to the length. Read the next byte from the bytestream and make it the
bottom 8 bits of the offset quantity. If the offset is 0xFFF then take the
previous pixel from the decoded image and copy it into the decoded image
for (length) iterations. If offset is other than 0xFFF then move length
pixels into the decoded image at the current offset from the current
offset + (offset - 4096).


Video Coding Method 8
---------------------
Video coding method 8 is precisely the same as video coding method 6
except for the procedure for decoding tag 3. To decode tag 3, decode the
next byte in the bytestream (first_byte).

If the top 2 bits of first_byte are set then the required operation is to
copy a run of pixels (from the previous frame?) to the current offset of
the current frame. The length of the run is denoted by the bottom 6 bits
of first_byte plus 8. The 12-bit offset quantity is denoted by the next 4
bits is the bit queue (top 4 bits of the quantity) combined with the next
byte from the bytestream (bottom 8 bits). Move (length) bytes from the
current offset plus the offset quantity + 1 into the current offset.

If the top 2 bits of first_byte are not set then the required operation is
to either copy a run the previous pixels to the current position or repeat
the previous pixel for a number of pixels. If the top bit of first_byte is
0 then the length of the pixel run is defined as the quantity 6 plus bits
6-4 of first_byte (this yields a range of 6..13). The offset is defined as
the 12-bit quantity by combining bits 3-0 (top 4 bits of quantity) and the
next byte in the bytestream (bottom 8 bits). If the top bit of first_byte
is set then the length of the pixel run is defined as the quantity 14 plus
bits 5-0 of first_byte (this yields a range of 14..77). The offset is
defined as the 12-bit quantity by combining the next 4 bits from the bit
queue (top 4 bits of quantity) and the next byte from the bytestream
(bottom 8 bits). If the offset is 0xFFF then take the previous pixel from
the decoded image and copy it into the decoded image for (length)
iterations. If offset is other than 0xFFF then move length pixels into the
decoded image at the current offset from the current offset + 
(offset - 4096).


Audio Format
------------
[document DPCM format]


Appendix A: Games Using GDV
---------------------------
These games are known to use the GDV file format for their FMV:

 * Hardwar
 * Normality
 * Realms of the Haunting


References
----------
Realms of the Haunting:
http://www.mobygames.com/game/dos/realms-of-the-haunting
Unofficial fansite:
http://www.realmsofthehaunting.com/


ChangeLog
---------
v0.3: November 27, 2005
- documented the remaining video coding formats
- held in pre-1.0 state until a decoder is successfully implemented based
on this information

v0.2: November 22, 2005
- documented 2.5/4 video coding formats

v0.1: November 9, 2005
- initial release
- describes general file format


GNU Free Documentation License
------------------------------
see http://www.gnu.org/licenses/fdl.html