NOTE: This document is no longer maintained

Instead, this information is now maintained in Wiki format at the MultimediaWiki.

Please be advised that there is incorrect information in this document that will not be corrected. Please look up the corrected information in the MultimediaWiki. -- February 4, 2006

Simple Time Domain Audio Coding

by Mike Melanson

Abstract

This document presents the underlying principles and on-disk data formats of comparatively simple audio coding formats that operate in the time domain such as pulse code modulation (PCM), differential PCM (DPCM), and adaptive DPCM (ADPCM).
v1.1: December 3, 2003
Copyright (c) 2003 Mike Melanson
Permission is granted to copy, distribute, and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".

Contents

1  Introduction
2  Pulse Code Modulation (PCM)
    2.1  Overview
    2.2  Linear PCM
    2.3  Logarithmic PCM
3  Differential Pulse Code Modulation (DPCM)
    3.1  Overview
    3.2  Id RoQ DPCM
    3.3  Interplay DPCM
    3.4  Xan DPCM
4  Adaptive Differential Pulse Code Modulation (ADPCM)
    4.1  Overview
    4.2  IMA ADPCM
        4.2.1  Overview
        4.2.2  Decoding IMA
        4.2.3  Quicktime IMA
        4.2.4  Microsoft IMA
        4.2.5  DVI
        4.2.6  Duck DK4 IMA
        4.2.7  Duck DK3 Joint Stereo IMA
        4.2.8  Westwood Studios IMA
        4.2.9  SDL Motion JPEG IMA
        4.2.10  Dialogic Modified IMA
        4.2.11  4X IMA
    4.3  Microsoft ADPCM
    4.4  CRI ADX
5  Other Simple Time Domain Formats
    5.1  SPC-700 Bit Rate Reduced (BRR)
6  Appendix A: Codec Tables
    6.1  mu-law - linear PCM conversion
    6.2  A-law - linear PCM conversion
    6.3  Interplay DPCM delta table
    6.4  Standard IMA tables
    6.5  Dialogic modified IMA tables
    6.6  MS ADPCM tables
7  References
8  Acknowledgements
9  Changelog
10  GNU Free Documentation License

1  Introduction

A time domain audio coding method operates on samples in the time domain as opposed to the frequency domain. This document describes the algorithms and specific on-disk data formats used to encode a variety of simple time domain audio standards. This discussion is primarily focused on algorithms used in entertainment multimedia applications.
This document began life as "The Skinny on ADPCM Data Formats" which, as the name implied, only covered ADPCM coding algorithms and helped many developers navigate the sea of sparsely-documented ADPCM formats. This document has now been expanded to cover a larger family of coding algorithms.

2  Pulse Code Modulation (PCM)

2.1  Overview

This is arguably the simplest time domain audio format. PCM audio data is a sequence of samples in which each sample represents an audio wave's amplitude at a discrete point in time.

2.2  Linear PCM

Linear PCM comes in a wide variety of flavors. It is useful to break it down into its parameters:
The most common PCM formats revolve around the most commonly available hardware: Little-endian Intel CPUs. Microsoft multimedia files (WAV/AVI/ASF) designate PCM audio with format 0x01. If the WAVEFORMAT header indicates 8 bits/sample, the data will be unsigned. If the header indicates 16 bits/sample, the data will be signed and little-endian. Stereo data will be interleaved, LRLRLR. Frequency will be specified in the WAVEFORMAT header.
Apple Quicktime files with audio will have an audio stsd audio that will specify audio fourcc, frequency, bits/sample, and number of channels. The signed-ness of PCM data is indicated by the audio fourcc. 'raw ' (note the space needed to complete the fourcc) means unsigned data. If 'raw ' data is 16 bits/sample, it will always be little-endian. 'twos' indicates 2's complement, big-endian data. 'sowt' ('twos' backwards) indicates 2's complement, little-endian data. Stereo data is always interleaved.
As another common example of PCM audio coding, compact discs use interleaved stereo, 16-bit, little-endian stereo with a sample rate of 44100 Hz. 44100 Hz is also known as the CD sample rate. Two other common sample rates, 22050 Hz and 11025 Hz, simply divide the CD sample rate by 2 and 4, respectively.
A quick note about some sample rate oddities you may encounter in certain vintage multimedia files (from the early-, mid-, and even late-1990s): You may see sample rates of 11127 and 22254. Where did these numbers come from? The original Sound Blaster was the first commodity PC sound card with a digital-analog converter (DAC) for digital audio playback. When programming the Sound Blaster DAC, it is necessary to program the sample rate by sending a 2's complement signed byte according to this formula:

sample rate=256-  1000000

divisor
If 22050 Hz is the desired sample rate:

22050=256-  1000000

divisor

divisor=-  1000000

22050-256
=-45.8
Thus, -46 = 0xD2 is the closest frequency divisor which yields an inexact frequency.

2.3  Logarithmic PCM

Rather than representing sample amplitudes on a linear scale as linear PCM coding does, logarithmic PCM coding plots the amplitudes on a logarithmic scale. Log PCM is more often used in telephony and communications applications than in entertainment multimedia applications.
There are two major variants of log PCM: mu-law (m-law) and A-law. Mu-law coding uses the format number 0x07 in Microsoft multimedia files (WAV/AVI/ASF) and the fourcc 'ulaw' in Apple Quicktime files. A-law coding uses the format number 0x06 is Microsoft multimedia files and the fourcc 'alaw' in Apple Quicktime files.
Every byte of a log PCM data chunk maps to a signed 16-bit linear PCM sample. See Appendix A for the mu-/A-law - PCM conversion code.

3  Differential Pulse Code Modulation (DPCM)

3.1  Overview

Differential, or delta, pulse code modulation algorithms encode the differences between successive PCM samples, rather than storing the actual samples. For example, instead of storing the following sequence of samples:
80 100  50  70  90 130 120 
The first sample is followed by a series of deltas:
80  20 -50  20  20  40 -10
The resulting encoding is a series of smaller numbers which each require less information to encode.
DPCM algorithms typically encode a chunk of audio with the first sample in some preamble represented with the maximum number of bits, followed by a series of indices into a delta table.

3.2  Id RoQ DPCM

RoQ multimedia files are found in the Quake III PC games as well as games such as Return to Castle Wolfenstein and Jedi Knight 2 that based on the Quake III engine. The audio in a RoQ file is encoded with a DPCM format.
A chunk of RoQ DPCM is laid out as (all multi-byte numbers are little-endian):
bytes 0-1
chunk ID: 0x1020 for mono data, 0x1021 for stereo data
bytes 2-5
chunk size, not including 8-byte preamble
bytes 6-7
initial predictor(s)
bytes 8..n
DPCM bytes
If the block is mono data, each byte represents the square root of the difference between the last PCM sample and the current PCM sample. To decode the audio, follow this process:
if (current DPCM byte < 128)
  next PCM sample = last PCM sample + (current DPCM byte) * (current DPCM byte)
else
  next PCM sample = last PCM sample - ((current DPCM byte) * (current DPCM byte))
Trivially, this process can be optimized by precalculating the squares of all 256 possible DPCM bytes.
If the audio data is stereo, the 16-bit predictor encodes both the initial right and left predictors. After decoding the little-endian 16-bit predictor number, the upper 8 bits (bits 15-8) are the upper 8 bits of the initial left channel predictor. The lower 8 bits are the upper 8 bits of the initial right channel predictor. The DPCM bytes are decoded in the same manner as for mono data except that left and right DPCM bytes are interleaved.

3.3  Interplay DPCM

Computer games published by Interplay and its subsidiary companies often use the custom Interplay MVE format to transport multimedia. Audio can be stored in these files using PCM or a custom DPCM format.
Interplay DPCM requires a 256-element delta table for encoding and decoding. The delta table is listed in Appendix A. For each chunk of DPCM data in an Interplay MVE file, the first 2 bytes comprise an initial predictor stored in a signed, 16-bit, little-endian format. If the file is stereo, that predictor is treated as the left channel initial predictor and the next 2 bytes comprise the right channel initial predictor. The remainder of the bytes are indices into the delta table. For each byte, fetch a signed delta and apply it to the appropriate predictor (stereo data is interleaved LRLR...). Saturate the predictor to a signed 16-bit range after each delta is applied.

3.4  Xan DPCM

Origin's Wing Commander IV computer game transports multimedia cutscenes in standard Microsoft AVI files. The files use a custom video codec named Xan (fourcc: 'Xxan'). The audio is transported with a custom format that this document takes the liberty of naming Xan DPCM.
Note that a AVI file demuxer will probably need to be modified to support the algorithm. The WAVEFORMAT headers in the Xan AVI files report the audio coding as format 0x01: PCM. However, the file's 'auds' chunk begins with the fourcc 'Axan'. A program can either check for this or assume that the file uses Xan DPCM if it uses Xan video.
Classifying the Xan audio coding method as a DPCM algorithm is a little shaky. It actually resembles a cross between a DPCM algorithm and a APDCM algorithm. Perhaps the designers could not decide between the two algorithm families and decided to split the difference. The algorithm encodes 16-bit PCM samples as 8-bit bytes by packing a 6-bit delta value along with a 2-bit delta modifier into a byte.
For each chunk of Xan DPCM data, the first 2 or 4 bytes are the initial predictors for that chunk, depending on mono or stereo data, and are encoded as signed, 16-bit, little-endian numbers. A shifter value for each channel is initialized to 4. For each byte in the stream (assuming mono data):
byte = next byte in stream
diff = (byte & 0xFC) << 8
if bottom 2 bits of byte are both 1 (byte & 0x03)
    shifter++
else
    shifter -= (2 * (byte & 3))
note that the shift value may not go below 0 and must be clamped here
shift diff right by shifter value
apply diff to the current predictor
saturate predictor to signed, 16-bit range
Note that diff must be treated as a signed 16-bit number. For stereo data, the bytes represent interleaved samples in LRLR order.

4  Adaptive Differential Pulse Code Modulation (ADPCM)

4.1  Overview

ADPCM is similar to DPCM in that it encodes the differences between successive samples. However, the word "adaptive" in the name means that the algorithm can adapt the current predictor according to the PCM data at a discrete point in time, thus minimizing prediction error.
There is a variety of ADPCM algorithms for different applications. This document primarily focuses on the algorithms used for entertainment multimedia applications. These applications generally compress to a 4:1 ratio: 4 bits of ADPCM are converted into a signed 16-bit PCM sample. More complicated algorithms used in telecommunications, for example, are more complicated and offer different compression rations.

4.2  IMA ADPCM

4.2.1  Overview

The Interactive Multimedia Association (IMA) developed an ADPCM algorithm designed to be used in entertainment multimedia applications. It is particularly fast to encode and decode and does not strictly require any multiplications or floating point operations.
While the encoding and decoding algorithms remain more or less constant across different IMA implementations, the specific on-disk data formats vary. The following sections will describe the IMA codec algorithm and the various methods used to store the coded data.

4.2.2  Decoding IMA

To decode IMA ADPCM, initialize 3 variables:
predictor:
This is either initialized from the data chunk preamble specified in the format or is initialized to 0 at the start of the decoding process
step index:
Similar to the initial predictor, this variable is initialized from the data chunk preamble or set to 0 at the start of the decoding process
step:
This variable is initialized to ima_step_table[step_index]
The encoded IMA bitstream is comprised of a series of 4-bit nibbles. This means that each byte represents 2 IMA nibbles. The specific data format will dictate whether the stream is decoded top nibble first or bottom nibble first, and whether there is stereo interleaving within the IMA nibbles. For this discussion, imagine the IMA bitstream as a series of nibbles representing a single audio channel:
n0 n1 n2 n3 n4 n5 ...
Where each nibble represents both a table index and a sign/magnitude number during the decoding process. Transform each nibble in the stream into a signed, 16-bit PCM sample using the following process:

step index=step index+ima index table[(unsigned)nibble]

diff=  ((signed)nibble+0.5)*step

4

predictor=predictor+diff

step=ima step table[step index]
See Appendix A for the relevant IMA decoding tables.
Regarding the step index and predictor calculations: Be sure to clamp the computed step index between 0 and 88 (table limits) and the predictor between -32768 and 32767 (signed 16-bit number range). It is possible for these values to outrange which could cause undesirable program behavior if unchecked.
A note about the following calculation:

diff=  ((sign/mag.)nibble+0.5)*step

4
At first glance, it appears that this calculation requires floating point operations and an arbitrary (not power-of-2) multiplication. However, some numerical manipulations reveal some useful simplifications:

diff=
step*nibble+  step

2

4

diff=  step*nibble

4
+  step

8
The step / 8 calculation can be expressed as a bit shift right by 3 (step SHR 3). The first part of the equation can also be simplified. Since a nibble only carries 4 bits, and those 4 bits are a sign/magnitude number, there are only 3 bits of magnitude information. If all 3 magnitude bits are set to 1:

nibble=4+2+1

 step*nibble

4
=  4*step

4
+  2*step

4
+  1*step

4
=step+  step

2
+  step

4
Thus, if bit 2 of the nibble is set, add step to diff. If bit 1 is set, add (step / 2 = step SHR 1) to diff. If bit 0 is set, add (step / 4 = step SHR 2) to diff. Finally, if the sign bit is set, subtract the final diff value from the predictor value; otherwise, add the final diff value to the predictor value. The usual algorithm is as follows:
sign = nibble & 8
delta = nibble & 7
diff = step >> 3
if (delta & 4) diff += step
if (delta & 2) diff += (step >> 1)
if (delta & 1) diff += (step >> 2)
if (sign) predictor -= diff
else predictor += diff
This method was particularly useful back when IMA was implemented on commodity CPUs which were relatively slow at multiplication. One multiplication per audio sample had a notable impact on program performance, as opposed to the series of branches, additions and logical bit operations. If multiplication performance is not an issue, it is possible to carry out the diff calculation with only one non-power-of-2 multiplication and no floating point numbers:

diff=  ((signed)nibble+0.5)*step

4
*  2

2

diff=  (nibble+0.5)*2*step

8

diff=  (2*nibble+1)*step

8

4.2.3  Quicktime IMA

Quicktime files can store either mono or stereo IMA data. Files with IMA data contain the codec fourcc "ima4" in the audio stsd atom. The files store the data in blocks of nibbles. The individual IMA samples are never interleaved; one block of IMA nibbles represents either all left or all right PCM samples.
In any given IMA-encoded Quicktime file, the size of an individual block of IMA nibbles is stored in the bytes/packet field present in the extended audio information portion in an audio stsd atom (see the Quicktime documentation for more information). However, this size always seems to be 34 bytes/block. Sometimes, IMA-encoded Quicktime files are missing the extended wave information header. In this case, assume that each IMA block is 34 bytes.
The first 2 bytes of a block specify a preamble with the initial predictor and step index. The 2 bytes are read from the stream as a big-endian 16-bit number which has the following breakdown:
pppppppp piiiiiii
Bits 15-7 of the preamble are the top 9 bits of the initial signed predictor; bits 6-0 of the initial predictor are always 0. Bits 6-0 of the preamble specify the initial step index. Note that this gives a range of 0..127 which should be clamped to 0..88 for good measure.
The remaining bytes in the IMA block (of which there are usually 32) are the ADPCM nibbles. In Quicktime IMA data, the bottom nibble of a byte is decoded first, then the top nibble:
byte0 byte1 byte2 byte3 ...
 n1n0  n3n2  n5n4  n7n6 ...
If a file is encoded as mono IMA, all of the blocks encode that one channel. However, if the file is encoded as stereo IMA, the first block is left audio data, the second block is right audio data, and the stereo interleaving continues on the block level for the duration of the file.

4.2.4  Microsoft IMA

A Microsoft media file (this includes AVI, ASF, and WAV) that is encoded with IMA ADPCM data has an audio format number of 0x11. The file will have a WAVEFORMAT structure in its header which contains a field named nBlockAlign. This field reveals the size of a block of IMA-encoded data.
Note that Microsoft IMA data can also occur in Apple Quicktime files using the fourcc 'msx0x11'. In this case, a MS WAVEFORMAT header will be attached to the Quicktime file's audio stsd atom.
If the IMA data is monaural, an individual chunk of data begins with the following preamble:
bytes 0-1:
initial predictor (in little-endian format)
byte 2:
initial index
byte 3:
unknown, usually 0 and is probably reserved
The remaining bytes in the chunk are the IMA nibbles. Each byte is decoded bottom nibble first, then top nibble as follows:
byte0 byte1 byte2 byte3 ...
 n1n0  n3n2  n5n4  n7n6 ...
If the IMA data is stereo, a chunk begins with two preambles, one for the left audio channel and one for the right channel:
bytes 0-1:
initial predictor (in little-endian format) for left channel
byte 2:
initial index for left channel
byte 3:
unknown, usually 0 and is probably reserved
bytes 4-5:
initial predictor (in little-endian format) for right channel
byte 6:
initial index (for right channel)
byte 7:
unknown, usually 0 and is probably reserved
The remaining bytes in the chunk are the IMA nibbles. The first 4 bytes, or 8 nibbles, belong to the left channel and the next 4 bytes belong to the right channel. This interleaving continues until the end of the chunk:
byte0 byte1 byte2 byte3 ...
 n1n0  n3n2  n5n4  n7n6 ...(left channel)
byte4 byte5 byte6 byte7 ...
 n1n0  n3n2  n5n4  n7n6 ...(right channel)

4.2.5  DVI

According to the XAnim multimedia application, there are two variants of DVI ADPCM, but of which are encoded and decoded with the IMA ADPCM algorithm. The original variant encodes the top nibble of a byte first, then the bottom nibble. The revised variant has the opposite encoding order.

4.2.6  Duck DK4 IMA

Some Sega Saturn game CDs contain AVI files which store audio using the Duck DK4 ADPCM algorithm. These AVI files report format 0x61 as their audio codec. DK4 data can be decoded using the same algorithm and tables as are used to decode IMA ADPCM data. The name apparently comes from the fact that 4 ADPCM nibbles decode to 4 16-bit PCM samples, in contrast to Duck's DK3 ADPCM algorithm, in which 3 ADPCM nibbles decode to 4 16-bit PCM samples.
It is important to note that WAVE format 0x61 is not officially registered to the Duck Corporation. Official registries of WAVE formats typically list this number as being registered to ESS Technology.
The length of a single block of DK4 data is encoded in the nBlockAlign field of an AVI file's WAVEFORMAT header. The chunk encoding format is very similar to MS IMA. If the DK4 data is monaural, an individual chunk of data begins with the following preamble:
bytes 0-1:
initial predictor (in little-endian format)
byte 2:
initial index
byte 3:
unknown, usually 0 and is probably reserved
The initial predictor is placed directly into the output as the first PCM sample. The remaining bytes in the chunk are the IMA nibbles. Each byte is decoded top nibble first (bits 7-4), then bottom nibble as follows:
byte0 byte1 byte2 byte3 ...
 n0n1  n2n3  n4n5  n6n7 ...
If the DK4 data is stereo, a chunk begins with two preambles, one for the left audio channel and one for the right channel:
bytes 0-1:
initial predictor (in little-endian format) for left channel
byte 2:
initial index for left channel
byte 3:
unknown, usually 0 and is probably reserved
bytes 4-5:
initial predictor (in little-endian format) for right channel
byte 6:
initial index (for right channel)
byte 7:
unknown, usually 0 and is probably reserved
The initial left and right channel predictors are placed directly into the output as the first PCM samples for each channel. The remaining bytes in the chunk are the IMA nibbles. For each byte, the top nibble (bits 7-4) corresponds to the left channel and the bottom nibble corresponds to the right channel:
byte0 byte1 byte2 byte3 ...
 L0R0  L1R1  L2R2  L3R3 ...

4.2.7  Duck DK3 Joint Stereo IMA

Some Sega Saturn game CDs contain AVI files which store audio using the Duck DK3 ADPCM algorithm. These AVI files report format 0x62 as their audio codec. DK3 ADPCM data can be decoded using the same tables as are used to decode IMA ADPCM data while using a slightly modified variant of the IMA ADPCM algorithm. The name DK3 apparently comes from the fact that 3 ADPCM nibbles decode to 4 16-bit PCM samples, in contrast to Duck's DK4 ADPCM algorithm, in which 4 ADPCM nibbles decode to 4 16-bit PCM samples.
It is important to note that WAVE format 0x62 is not officially registered to the Duck Corporation. Depending on which version of the audio codec registry is examined, this format will appear as being registered to either Quanta Computer or VoxWare.
All multi-byte values are encoded in little-endian format. The length of a single block of DK3 data is encoded in the nBlockAlign field of an AVI file's WAVEFORMAT header.
The DK3 algorithm encodes a sum channel and a difference channel, rather than left and right channels, using the standard IMA ADPCM algorithm and tables. Note that the encoding implies that the format only supports stereo data. A block of DK3 has a 16-byte preamble with the following information:
bytes 0-1
unknown
bytes 2-3
sample rate
bytes 4-9
unknown
bytes 10-11
initial sum channel predictor
bytes 12-13
initial diff channel predictor
byte 14
initial sum channel index
byte 15
initial diff channel index
After processing the block preamble, a stream of DK3 data is decoded nibble by nibble, just like any ADPCM data. The low nibble is decoded first (bits 3-0), then the high nibble. When decoding the stream, it is useful to conceptualize it as a stream of nibbles:
n0 n1 n2 n3 n4 n5 n6 n7 ...
where the nibbles were arranged in the original bytestream as:
byte0 byte1 byte2 byte3
 n1n0  n3n2  n5n4  n7n6 ...
Each set of 3 nibbles decodes to 4 16-bit PCM samples using this process (note that the diff value is initialized to the same value as the diff predictor):

4.2.8  Westwood Studios IMA

Many games published by Westwood Studios use VQA files to transport movie animations and AUD files to transport audio clips. Such titles include the Command & Conquer and Lands of Lore series. Westwood Studios multimedia files store audio using the standard IMA ADPCM algorithm.
VQA is a tagged format with different chunks marked by fourccs. A 'SND2' chunk contains IMA ADPCM nibbles. There is no chunk preamble that specifies initial predictor and index. The predictor and index variables are both initialized to 0 when file playback is started and maintained across chunks. This makes random seeking through Westwood Studios multimedia files quite difficult.
If the audio is mono data, the low nibble is decoded first (bits 3-0) then the high nibble:
byte0 byte1 byte2 byte3 ...
 n1n0  n3n2  n5n4  n7n6 ...
If the audio is stereo data, left and right bytes are interleaved. Each byte represents 2 samples for either the left of channel:
byte0 byte1 byte2 byte3 ...
 L1L0  R1R0  L3L2  R3R2 ...

4.2.9  SDL Motion JPEG IMA

SMJPEG stands for SDL Motion JPEG. It is an animation format used by Loki Games for porting computer games (and their full motion video) to Linux. SMJPEG is a chunked file format which uses FOURCCs to identify blocks in the file as well as audio and video codecs. The only known video FOURCC used is 'JFIF' for JPEG. The only known audio FOURCC is 'APCM' for ADPCM.
The ADPCM algorithm is standard IMA ADPCM. Compressed audio data comes packaged in 'sndD' chunks. Each chunk is stamped with a millisecond presentation timestamp and a data length, which is usually 0x104 bytes. The first 4 bytes are the initial conditions for decoding the ADPCM block:
bytes 0-1
initial predictor, big endian format
byte 2
initial index
byte 3
unused
The remainder of the data bytes in the chunk are ADPCM nibbles to be decoded with the standard ADPCM algorithm. The low nibble is decoded first (bits 3-0), then the high nibble.
Note that the SMJPEG format description apparently supports stereo. No stereo samples have been encountered at the time of this writing. It is unknown how the format would store stereo data.

4.2.10  Dialogic Modified IMA

Dialogic ADPCM is a variation of the standard IMA ADPCM algorithm that is optimized for monaural voice data. The encoder operates on 12-bit input samples and outputs 4-bit encoding for each sample. This yields a 3:1 compression ratio.
Dialogic ADPCM data is transported in raw files bearing the extension VOX. For each byte in the file, the high nibble (bits 7-4) is decoded first, then the low nibble.
The decoding algorithm is the same as the standard IMA ADPCM algorithm with the following modifications:
See Appendix A for the modified IMA tables.

4.2.11  4X IMA

Some computer and console games use 4xm multimedia files that are encoded with 4X Technologies' proprietary video codec and either linear PCM audio or a modified IMA ADPCM. The file format also supports multiple audio tracks intended for multi-lingual multimedia files.
4xm is a chunked file format where each chunk is marked with a fourcc. A 4xm file header contains a 'strk' audio header for each audio track in the file. Byte 12 of the 'strk' audio chunk apparently is set to 1 if the track is encoded as ADPCM.
All multi-byte numbers are encoded in little-endian format. Each chunk is encoded with a preamble to describe the initial predictors and step indices. If the audio data is monaural, the preamble is laid out as:
bytes 0-1:
initial predictor
bytes 2-3:
initial index
The rest of the bytes in the chunk are IMA nibbles. Each byte is decoded bottom nibble first (bits 3-0), then top nibble:
byte0 byte1 byte2 byte3 ...
 n1n0  n3n2  n5n4  n7n6 ...
If the audio data is stereo, the initial predictors and step indices are interleaved in the chunk preamble:
bytes 0-1:
initial predictor for left channel
bytes 2-3:
initial predictor for right channel
bytes 4-5:
initial index for left channel
bytes 6-7:
initial index for right channel
The first half of the remaining bytes of the chunk are the left IMA nibbles which the second half are the right of the remaining bytes are the right IMA nibbles. For example, if an entire chunk of stereo 4xm audio is 108 bytes (decimal), the first 8 bytes are the preamble, the next 50 bytes are left channel nibbles and the final 50 bytes are right channel nibbles. As in monaural data, bytes are decoded bottom nibble first (bits 3-0), then top nibble.
The 4X IMA algorithm is not exactly the same as the standard IMA algorithm. The key difference is in the diff calculation step. In the standard IMA algorithm, diff is calculated as:

diff=  (2*nibble+1)*step

8
=  delta*step

4
+  step

8
In the 4X IMA algorithm, diff is calculated as:

diff=  nibble*step+(step/2)

8
=  delta*step

8
+  step

4

4.3  Microsoft ADPCM

In a Microsoft media file (WAV, AVI, or ASF), this audio format is denoted by audio format 0x02. All multi-byte numbers are stored in little endian format.
MS ADPCM is organized in blocks. Each block has a preamble and a series of coded ADPCM nibbles. The total number of bytes in an individual ADPCM block is obtained through the nBlockAlign field of a media file's WAV header.
Note that Microsoft ADPCM data can also occur in Apple Quicktime files using the fourcc 'msx0x2'. In this case, a MS WAVEFORMAT header will be attached to the Quicktime file's audio stsd atom.
A MS mono ADPCM block begins with the following preamble:
byte 0
block predictor (should be in the range [0..6])
bytes 1-2
initial idelta
bytes 3-4
sample 1
bytes 5-6
sample 2
The initial idelta and both samples are signed numbers (so take sign extension into account). The block predictor value is used as an index into two adaptation coefficient tables in order to initialize two coefficients, coeff1 and coeff2.
The initial 2 samples from the block preamble are sent directly to the output. Sample 2 is first, then sample 1. The remaining samples are decoded from the ADPCM nibbles, which comprise the rest of the bytes in the block. The bytes are decoded from the upper nibble (bits 7-4) first, then the lower nibble. For each nibble:
See Appendix A for MS ADPCM reference tables.
For stereo data, the block preamble stores interleaved initialization values for the left and right channels:
byte 0
left channel block predictor (should be [0..6])
byte 1
right channel block predictor (should be [0..6])
bytes 2-3
left channel initial idelta
bytes 4-5
right channel initial idelta
bytes 6-7
left channel sample 1
bytes 8-9
right channel sample 1
bytes 10-11
left channel sample 2
bytes 12-13
right channel sample 2
Following the preamble, the left and right ADPCM samples are interleaved within each byte. The upper nibble (bits 7-4) contains the left channel ADPCM code and the lower nibble contains the right channel ADPCM code.

4.4  CRI ADX

CRI ADX is an ADPCM format primarily used in Sega Dreamcast games. Sometimes it is packaged in custom MPEG-like files along with MPEG video data. Sometimes it is packaged inside audio-only files. The container format specifies the playback frequency of the audio data and whether the audio is monaural or stereo.
ADX is organized in blocks of 18 bytes:
bytes 0-1
scale (encoded as little endian)
bytes 2-17
ADPCM nibbles
If the audio data is stereo, left blocks and right blocks with the above format are interleaved.
Each coded ADX channel has two state variables, sample1 and sample2, which are both initialized to 0 at the start of playback. The 16 data bytes in each ADX block are decoded top nibble (bits 7-4) first, then bottom nibble). For each nibble:

sample0=  (BaseVolume*(signed)nibble*scale+0x7298*sample1-0x3350*sample2)

16384

sample0=SaturateS16(sample0)

next PCM sample=sample0

sample2=sample1

sample1=sample0
Notes:

5  Other Simple Time Domain Formats

5.1  SPC-700 Bit Rate Reduced (BRR)

The SPC-700 is a custom Sony audio coprocessor used inside the Super Nintendo Entertainment System. The SPC-700 has its own instruction set and memory space and runs programs uploaded by the main SNES CPU. The SPC-700 manages 8 independent audio channels that play samples from somewhere in the memory space. These channels only play samples that are encoded in a format called bit rate reduced.
BRR coding offers close to 4:1 compression. Technically, the ratio is 32:9 as blocks of 16 16-bit samples are packed into 16 4-bit nibbles with a 1-byte preamble for the whole block. Thus, each block is 9 bytes long.
The preamble byte has the following bit definitions:
bits 7-4:
range bits
bits 3-2:
filter definition
bit 1:
loop bit
bit 0:
end bit
If the end bit is set, this block is the last of the series of blocks. The loop bit is set in each constituent block of a sample that loops. The filter definition bits allow the sample to be filtered in a variety of ways. For more information on the filter types, consult "The Bit Rate Reduction Sound Encoding Scheme" listed in the references.
Each of the bytes in the block from offset 1..8 are decoded top nibble first (bits 7-4), then bottom nibble. In order to expand a nibble, shift the sample left by the number of bits specified by the range. Note that the nibble is encoded as a two's complement signed number and must be sign-extended before the shift. Also note that range values from 12..15 would shift some or all of the coded nibble bits out of the final sample and are thus invalid.

6  Appendix A: Codec Tables

This section lists the tables necessary to encode and decode to and from various audio formats.

6.1  mu-law - linear PCM conversion

The following function converts a mu-law byte to a signed 16-bit PCM sample. It can be used to build a 256-entry table of PCM samples for fast table-based mu-law decoding. The function comes from (http://www.speech.cs.cmu.edu/comp.speech/Section2/Q2.7.html) and is credited to Craig Reese of the IDA/Supercomputing Research Center.
static int mulaw2linear(unsigned char mulawbyte) {
  static int exp_lut[8] = {0,132,396,924,1980,4092,8316,16764};
  int sign, exponent, mantissa, sample;
  mulawbyte = mulawbyte;
  sign = (mulawbyte & 0x80);
  exponent = (mulawbyte >> 4) & 0x07;
  mantissa = mulawbyte & 0x0F;
  sample = exp_lut[exponent] + (mantissa << (exponent + 3));
  if (sign != 0) sample = -sample;
  return(sample);
}

6.2  A-law - linear PCM conversion

The following A-law - PCM conversion function came from SoX Sound Exchange which in turn came from Sun Microsystems. It can be used to build a 256-entry table of PCM samples for fast A-law decoding.
#define SIGN_BIT (0x80) /* Sign bit for a A-law byte. */
#define QUANT_MASK (0xf) /* Quantization field mask. */
#define SEG_SHIFT (4) /* Left shift for segment number. */
#define SEG_MASK (0x70) /* Segment field mask. */
static int alaw2linear(unsigned char a_val) {
  int t;
  int seg;
  a_val = 0x55;
  t = (a_val & QUANT_MASK) << 4;
  seg = ((unsigned)a_val & SEG_MASK) >> SEG_SHIFT;
  switch (seg) {
  case 0:
    t += 8;
    break;
  case 1:
    t += 0x108;
    break;
  
  default:
    t += 0x108;
    t <<= seg - 1;
  }
  return ((a_val & SIGN_BIT) ? t : -t);
}

6.3  Interplay DPCM delta table

This is the table of 256 deltas to use in decoding Interplay DPCM data:
int interplay_delta_table[] = {
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 47, 51, 56, 61,
66, 72, 79, 86, 94, 102, 112, 122,
133, 145, 158, 173, 189, 206, 225, 245,
267, 292, 318, 348, 379, 414, 452, 493,
538, 587, 640, 699, 763, 832, 908, 991,
1081, 1180, 1288, 1405, 1534, 1673, 1826, 1993,
2175, 2373, 2590, 2826, 3084, 3365, 3672, 4008,
4373, 4772, 5208, 5683, 6202, 6767, 7385, 8059,
8794, 9597, 10472, 11428, 12471, 13609, 14851, 16206,
17685, 19298, 21060, 22981, 25078, 27367, 29864, 32589,
-29973, -26728, -23186, -19322, -15105, -10503, -5481, -1,
1, 1, 5481, 10503, 15105, 19322, 23186, 26728,
29973, -32589, -29864, -27367, -25078, -22981, -21060, -19298,
-17685, -16206, -14851, -13609, -12471, -11428, -10472, -9597,
-8794, -8059, -7385, -6767, -6202, -5683, -5208, -4772,
-4373, -4008, -3672, -3365, -3084, -2826, -2590, -2373,
-2175, -1993, -1826, -1673, -1534, -1405, -1288, -1180,
-1081, -991, -908, -832, -763, -699, -640, -587,
-538, -493, -452, -414, -379, -348, -318, -292,
-267, -245, -225, -206, -189, -173, -158, -145,
-133, -122, -112, -102, -94, -86, -79, -72,
-66, -61, -56, -51, -47, -43, -42, -41,
-40, -39, -38, -37, -36, -35, -34, -33,
-32, -31, -30, -29, -28, -27, -26, -25,
-24, -23, -22, -21, -20, -19, -18, -17,
-16, -15, -14, -13, -12, -11, -10, -9,
-8, -7, -6, -5, -4, -3, -2, -1
};

6.4  Standard IMA tables

The tables step_table[] and index_table[] are from the ADPCM reference source listed in the references. They are used to decode most variants of IMA ADPCM data.
int index_table[16] = {
  -1, -1, -1, -1, 2, 4, 6, 8,
  -1, -1, -1, -1, 2, 4, 6, 8
};
Note that many programs use slight deviations from the following table, but such deviations are negligible:
int step_table[89] = { 
  7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 
  19, 21, 23, 25, 28, 31, 34, 37, 41, 45, 
  50, 55, 60, 66, 73, 80, 88, 97, 107, 118, 
  130, 143, 157, 173, 190, 209, 230, 253, 279, 307,
  337, 371, 408, 449, 494, 544, 598, 658, 724, 796,
  876, 963, 1060, 1166, 1282, 1411, 1552, 1707, 1878, 2066, 
  2272, 2499, 2749, 3024, 3327, 3660, 4026, 4428, 4871, 5358,
  5894, 6484, 7132, 7845, 8630, 9493, 10442, 11487, 12635, 13899, 
  15289, 16818, 18500, 20350, 22385, 24623, 27086, 29794, 32767 
};

6.5  Dialogic modified IMA tables

The following table is the modified (and abbreviated) step table used to decode Dialogic ADPCM data. This table comes from the Dialogic ADPCM document listed in the references:
int dialogic_ima_step[49] = { 
  16, 17, 19, 21, 23, 25, 28, 31, 34, 37, 41, 45, 
  50, 55, 60, 66, 73, 80, 88, 97, 107, 118, 130, 143,
  157, 173, 190, 209, 230, 253, 279, 307, 337, 371, 408, 449, 
  494, 544, 598, 658, 724, 796, 876, 963, 1060, 1166, 1282, 1411, 1552 
};

6.6  MS ADPCM tables

The following tables come from libsndfile. They are used to decode Microsoft ADPCM data:
int AdaptationTable [] = { 
  230, 230, 230, 230, 307, 409, 512, 614, 
  768, 614, 512, 409, 307, 230, 230, 230 
} ;
int AdaptCoeff1 [] = { 256, 512, 0, 192, 240, 460, 392 } ;
int AdaptCoeff2 [] = { 0, -256, 0, 64, 0, -208, -232 } ;

7  References

These are some of the sources examined during the creation of this document:

8  Acknowledgements

9  Changelog

10  GNU Free Documentation License

Please see gnu.org's GFDL page:http://www.gnu.org/licenses/fdl.html.


File translated from TEX by TTH, version 3.40.
On 3 Dec 2003, 23:35.