#LyX 1.3 created this file. For more info see http://www.lyx.org/ \lyxformat 221 \textclass article \language english \inputencoding auto \fontscheme default \graphics default \paperfontsize default \papersize Default \paperpackage a4 \use_geometry 0 \use_amsmath 0 \use_natbib 0 \use_numerical_citations 0 \paperorientation portrait \secnumdepth 3 \tocdepth 3 \paragraph_separation indent \defskip medskip \quotes_language english \quotes_times 2 \papercolumns 1 \papersides 1 \paperpagestyle default \layout Title Simple Time Domain Audio Coding \layout Author by Mike Melanson \layout Abstract This document presents the underlying principles and on-disk data formats of comparatively simple audio coding formats that operate in the time domain such as pulse code modulation (PCM), differential PCM (DPCM), and adaptive DPCM (ADPCM). \layout Verse v1.1: December 3, 2003 \layout Verse Copyright (c) 2003 Mike Melanson \layout Verse Permission is granted to copy, distribute, and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled \begin_inset Quotes eld \end_inset GNU Free Documentation License \begin_inset Quotes erd \end_inset . \layout Verse \begin_inset LatexCommand \tableofcontents{} \end_inset \layout Section Introduction \layout Standard A time domain audio coding method operates on samples in the time domain as opposed to the frequency domain. This document describes the algorithms and specific on-disk data formats used to encode a variety of simple time domain audio standards. This discussion is primarily focused on algorithms used in entertainment multimedia applications. \layout Standard This document began life as \begin_inset Quotes eld \end_inset The Skinny on ADPCM Data Formats \begin_inset Quotes erd \end_inset which, as the name implied, only covered ADPCM coding algorithms and helped many developers navigate the sea of sparsely-documented ADPCM formats. This document has now been expanded to cover a larger family of coding algorithms. \layout Section Pulse Code Modulation (PCM) \layout Subsection Overview \layout Standard This is arguably the simplest time domain audio format. PCM audio data is a sequence of samples in which each sample represents an audio wave's amplitude at a discrete point in time. \layout Subsection Linear PCM \layout Standard Linear PCM comes in a wide variety of flavors. It is useful to break it down into its parameters: \layout Itemize Resolution/sample size: This parameter specifies the amount of data used to represent each discrete amplitude sample. The most common values are 8 bits (1 byte), which gives a range of 256 amplitude steps, or 16 bits (2 bytes), which gives a range of 65536 amplitude steps. Other sizes, such as 12 and 24 bits, are occasionally seen. \layout Itemize Endian-ness: When more than one byte is used to represent a PCM sample, the endian-ness (big or little) must be known. Due to the widespread use of little-endian Intel CPUs, little-endian PCM tends to be the most common byte orientation. \layout Itemize Signed-ness: It is not enough to know that a PCM sample is, for example, 8 bits wide. Whether the sample is signed or unsigned is needed to understand the range. If the sample is unsigned, the sample range is 0..255 with a centerpoint of 128. If the sample is signed, the sample range is -128..127 with a centerpoint of 0. If a PCM type is signed, the sign encoding is almost always 2's complement. In very rare cases (most notably on the Sega CD video game console), signed PCM audio is represented as a series of sign/magnitude coded numbers. \layout Itemize Channels/Interleaving: If the PCM type is monaural, each sample will belong to that one channel. If there is more than one channel, the channels will almost always be interleav ed: Left sample, right sample, left, right, etc., in the case of stereo interleav ed data. In some rare cases, usually when optimized for special playback hardware, chunks of audio destined for different channels will not be interleaved. \layout Itemize Frequency/sample rate: This parameter measures how many samples/channel are played each second. Frequency is measured in samples/second (Hz). Common frequency values include 8000, 11025, 16000, 22050, 32000, 44100, and 48000 Hz. \layout Standard The most common PCM formats revolve around the most commonly available hardware: Little-endian Intel CPUs. Microsoft multimedia files (WAV/AVI/ASF) designate PCM audio with format 0x01. If the WAVEFORMAT header indicates 8 bits/sample, the data will be unsigned. If the header indicates 16 bits/sample, the data will be signed and little-endi an. Stereo data will be interleaved, LRLRLR. Frequency will be specified in the WAVEFORMAT header. \layout Standard Apple Quicktime files with audio will have an audio stsd audio that will specify audio fourcc, frequency, bits/sample, and number of channels. The signed-ness of PCM data is indicated by the audio fourcc. 'raw ' (note the space needed to complete the fourcc) means unsigned data. If 'raw ' data is 16 bits/sample, it will always be little-endian. 'twos' indicates 2's complement, big-endian data. 'sowt' ('twos' backwards) indicates 2's complement, little-endian data. Stereo data is always interleaved. \layout Standard As another common example of PCM audio coding, compact discs use interleaved stereo, 16-bit, little-endian stereo with a sample rate of 44100 Hz. 44100 Hz is also known as the CD sample rate. Two other common sample rates, 22050 Hz and 11025 Hz, simply divide the CD sample rate by 2 and 4, respectively. \layout Standard A quick note about some sample rate oddities you may encounter in certain vintage multimedia files (from the early-, mid-, and even late-1990s): You may see sample rates of 11127 and 22254. Where did these numbers come from? The original Sound Blaster was the first commodity PC sound card with a digital-analog converter (DAC) for digital audio playback. When programming the Sound Blaster DAC, it is necessary to program the sample rate by sending a 2's complement signed byte according to this formula: \layout Standard \begin_inset Formula \[ sample\, rate=256-\frac{1000000}{divisor}\] \end_inset \layout Standard If 22050 Hz is the desired sample rate: \layout Standard \begin_inset Formula \[ 22050=256-\frac{1000000}{divisor}\] \end_inset \layout Standard \begin_inset Formula \[ divisor=-\frac{1000000}{22050-256}=-45.8\] \end_inset \layout Standard Thus, -46 = 0xD2 is the closest frequency divisor which yields an inexact frequency. \layout Subsection Logarithmic PCM \layout Standard Rather than representing sample amplitudes on a linear scale as linear PCM coding does, logarithmic PCM coding plots the amplitudes on a logarithmic scale. Log PCM is more often used in telephony and communications applications than in entertainment multimedia applications. \layout Standard There are two major variants of log PCM: mu-law ( \begin_inset Formula $\mu$ \end_inset -law) and A-law. Mu-law coding uses the format number 0x07 in Microsoft multimedia files (WAV/AVI/ASF) and the fourcc 'ulaw' in Apple Quicktime files. A-law coding uses the format number 0x06 is Microsoft multimedia files and the fourcc 'alaw' in Apple Quicktime files. \layout Standard Every byte of a log PCM data chunk maps to a signed 16-bit linear PCM sample. See Appendix A for the mu-/A-law -> PCM conversion code. \layout Section Differential Pulse Code Modulation (DPCM) \layout Subsection Overview \layout Standard Differential, or delta, pulse code modulation algorithms encode the differences between successive PCM samples, rather than storing the actual samples. For example, instead of storing the following sequence of samples: \layout LyX-Code 80 100 50 70 90 130 120 \layout Standard The first sample is followed by a series of deltas: \layout LyX-Code 80 20 -50 20 20 40 -10 \layout Standard The resulting encoding is a series of smaller numbers which each require less information to encode. \layout Standard DPCM algorithms typically encode a chunk of audio with the first sample in some preamble represented with the maximum number of bits, followed by a series of indices into a delta table. \layout Subsection Id RoQ DPCM \layout Standard RoQ multimedia files are found in the Quake III PC games as well as games such as Return to Castle Wolfenstein and Jedi Knight 2 that based on the Quake III engine. The audio in a RoQ file is encoded with a DPCM format. \layout Standard A chunk of RoQ DPCM is laid out as (all multi-byte numbers are little-endian): \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 0-1 chunk ID: 0x1020 for mono data, 0x1021 for stereo data \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 2-5 chunk size, not including 8-byte preamble \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 6-7 initial predictor(s) \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 8..n DPCM bytes \layout Standard If the block is mono data, each byte represents the square root of the differenc e between the last PCM sample and the current PCM sample. To decode the audio, follow this process: \layout LyX-Code if (current DPCM byte < 128) \layout LyX-Code next PCM sample = last PCM sample + (current DPCM byte) * (current DPCM byte) \layout LyX-Code else \layout LyX-Code next PCM sample = last PCM sample - ((current DPCM byte) * (current DPCM byte)) \layout Standard Trivially, this process can be optimized by precalculating the squares of all 256 possible DPCM bytes. \layout Standard If the audio data is stereo, the 16-bit predictor encodes both the initial right and left predictors. After decoding the little-endian 16-bit predictor number, the upper 8 bits (bits 15-8) are the upper 8 bits of the initial left channel predictor. The lower 8 bits are the upper 8 bits of the initial right channel predictor. The DPCM bytes are decoded in the same manner as for mono data except that left and right DPCM bytes are interleaved. \layout Subsection Interplay DPCM \layout Standard Computer games published by Interplay and its subsidiary companies often use the custom Interplay MVE format to transport multimedia. Audio can be stored in these files using PCM or a custom DPCM format. \layout Standard Interplay DPCM requires a 256-element delta table for encoding and decoding. The delta table is listed in Appendix A. For each chunk of DPCM data in an Interplay MVE file, the first 2 bytes comprise an initial predictor stored in a signed, 16-bit, little-endian format. If the file is stereo, that predictor is treated as the left channel initial predictor and the next 2 bytes comprise the right channel initial predictor. The remainder of the bytes are indices into the delta table. For each byte, fetch a signed delta and apply it to the appropriate predictor (stereo data is interleaved LRLR...). Saturate the predictor to a signed 16-bit range after each delta is applied. \layout Subsection Xan DPCM \layout Standard Origin's Wing Commander IV computer game transports multimedia cutscenes in standard Microsoft AVI files. The files use a custom video codec named Xan (fourcc: 'Xxan'). The audio is transported with a custom format that this document takes the liberty of naming Xan DPCM. \layout Standard Note that a AVI file demuxer will probably need to be modified to support the algorithm. The WAVEFORMAT headers in the Xan AVI files report the audio coding as format 0x01: PCM. However, the file's 'auds' chunk begins with the fourcc 'Axan'. A program can either check for this or assume that the file uses Xan DPCM if it uses Xan video. \layout Standard Classifying the Xan audio coding method as a DPCM algorithm is a little shaky. It actually resembles a cross between a DPCM algorithm and a APDCM algorithm. Perhaps the designers could not decide between the two algorithm families and decided to split the difference. The algorithm encodes 16-bit PCM samples as 8-bit bytes by packing a 6-bit delta value along with a 2-bit delta modifier into a byte. \layout Standard For each chunk of Xan DPCM data, the first 2 or 4 bytes are the initial predictors for that chunk, depending on mono or stereo data, and are encoded as signed, 16-bit, little-endian numbers. A shifter value for each channel is initialized to 4. For each byte in the stream (assuming mono data): \layout LyX-Code byte = next byte in stream \layout LyX-Code diff = (byte & 0xFC) << 8 \layout LyX-Code if bottom 2 bits of byte are both 1 (byte & 0x03) \layout LyX-Code shifter++ \layout LyX-Code else \layout LyX-Code shifter -= (2 * (byte & 3)) \layout LyX-Code \layout LyX-Code note that the shift value may not go below 0 and must be clamped here \layout LyX-Code \layout LyX-Code shift diff right by shifter value \layout LyX-Code apply diff to the current predictor \layout LyX-Code saturate predictor to signed, 16-bit range \layout Standard Note that diff must be treated as a signed 16-bit number. For stereo data, the bytes represent interleaved samples in LRLR order. \layout Section Adaptive Differential Pulse Code Modulation (ADPCM) \layout Subsection Overview \layout Standard ADPCM is similar to DPCM in that it encodes the differences between successive samples. However, the word \begin_inset Quotes eld \end_inset adaptive \begin_inset Quotes erd \end_inset in the name means that the algorithm can adapt the current predictor according to the PCM data at a discrete point in time, thus minimizing prediction error. \layout Standard There is a variety of ADPCM algorithms for different applications. This document primarily focuses on the algorithms used for entertainment multimedia applications. These applications generally compress to a 4:1 ratio: 4 bits of ADPCM are converted into a signed 16-bit PCM sample. More complicated algorithms used in telecommunications, for example, are more complicated and offer different compression rations. \layout Subsection IMA ADPCM \layout Subsubsection Overview \layout Standard The Interactive Multimedia Association (IMA) developed an ADPCM algorithm designed to be used in entertainment multimedia applications. It is particularly fast to encode and decode and does not strictly require any multiplications or floating point operations. \layout Standard While the encoding and decoding algorithms remain more or less constant across different IMA implementations, the specific on-disk data formats vary. The following sections will describe the IMA codec algorithm and the various methods used to store the coded data. \layout Subsubsection Decoding IMA \layout Standard To decode IMA ADPCM, initialize 3 variables: \layout List \labelwidthstring 00.00.0000 predictor: This is either initialized from the data chunk preamble specified in the format or is initialized to 0 at the start of the decoding process \layout List \labelwidthstring 00.00.0000 step\SpecialChar ~ index: Similar to the initial predictor, this variable is initialized from the data chunk preamble or set to 0 at the start of the decoding process \layout List \labelwidthstring 00.00.0000 step: This variable is initialized to ima_step_table[step_index] \layout Standard The encoded IMA bitstream is comprised of a series of 4-bit nibbles. This means that each byte represents 2 IMA nibbles. The specific data format will dictate whether the stream is decoded top nibble first or bottom nibble first, and whether there is stereo interleaving within the IMA nibbles. For this discussion, imagine the IMA bitstream as a series of nibbles represent ing a single audio channel: \layout LyX-Code n0 n1 n2 n3 n4 n5 ... \layout Standard Where each nibble represents both a table index and a sign/magnitude number during the decoding process. Transform each nibble in the stream into a signed, 16-bit PCM sample using the following process: \layout Standard \begin_inset Formula \[ step\, index=step\, index+ima\, index\, table[(unsigned)nibble]\] \end_inset \layout Standard \begin_inset Formula \[ diff=\frac{((signed)nibble+0.5)*step}{4}\] \end_inset \layout Standard \begin_inset Formula \[ predictor=predictor+diff\] \end_inset \layout Standard \begin_inset Formula \[ step=ima\, step\, table[step\, index]\] \end_inset \layout Standard See Appendix A for the relevant IMA decoding tables. \layout Standard Regarding the step index and predictor calculations: Be sure to clamp the computed step index between 0 and 88 (table limits) and the predictor between -32768 and 32767 (signed 16-bit number range). It is possible for these values to outrange which could cause undesirable program behavior if unchecked. \layout Standard A note about the following calculation: \layout Standard \begin_inset Formula \[ diff=\frac{((sign/mag.)nibble+0.5)*step}{4}\] \end_inset \layout Standard At first glance, it appears that this calculation requires floating point operations and an arbitrary (not power-of-2) multiplication. However, some numerical manipulations reveal some useful simplifications: \layout Standard \begin_inset Formula \[ diff=\frac{step*nibble+\frac{step}{2}}{4}\] \end_inset \layout Standard \begin_inset Formula \[ diff=\frac{step*nibble}{4}+\frac{step}{8}\] \end_inset \layout Standard The step / 8 calculation can be expressed as a bit shift right by 3 (step SHR 3). The first part of the equation can also be simplified. Since a nibble only carries 4 bits, and those 4 bits are a sign/magnitude number, there are only 3 bits of magnitude information. If all 3 magnitude bits are set to 1: \layout Standard \begin_inset Formula \[ nibble=4+2+1\] \end_inset \layout Standard \begin_inset Formula \[ \frac{step*nibble}{4}=\frac{4*step}{4}+\frac{2*step}{4}+\frac{1*step}{4}=step+\frac{step}{2}+\frac{step}{4}\] \end_inset \layout Standard Thus, if bit 2 of the nibble is set, add step to diff. If bit 1 is set, add (step / 2 = step SHR 1) to diff. If bit 0 is set, add (step / 4 = step SHR 2) to diff. Finally, if the sign bit is set, subtract the final diff value from the predictor value; otherwise, add the final diff value to the predictor value. The usual algorithm is as follows: \layout LyX-Code sign = nibble & 8 \layout LyX-Code delta = nibble & 7 \layout LyX-Code diff = step >> 3 \layout LyX-Code if (delta & 4) diff += step \layout LyX-Code if (delta & 2) diff += (step >> 1) \layout LyX-Code if (delta & 1) diff += (step >> 2) \layout LyX-Code if (sign) predictor -= diff \layout LyX-Code else predictor += diff \layout Standard This method was particularly useful back when IMA was implemented on commodity CPUs which were relatively slow at multiplication. One multiplication per audio sample had a notable impact on program performance , as opposed to the series of branches, additions and logical bit operations. If multiplication performance is not an issue, it is possible to carry out the diff calculation with only one non-power-of-2 multiplication and no floating point numbers: \layout Standard \begin_inset Formula \[ diff=\frac{((signed)nibble+0.5)*step}{4}*\frac{2}{2}\] \end_inset \layout Standard \begin_inset Formula \[ diff=\frac{(nibble+0.5)*2*step}{8}\] \end_inset \layout Standard \begin_inset Formula \[ diff=\frac{(2*nibble+1)*step}{8}\] \end_inset \layout Subsubsection Quicktime IMA \layout Standard Quicktime files can store either mono or stereo IMA data. Files with IMA data contain the codec fourcc \begin_inset Quotes eld \end_inset ima4 \begin_inset Quotes erd \end_inset in the audio stsd atom. The files store the data in blocks of nibbles. The individual IMA samples are never interleaved; one block of IMA nibbles represents either all left or all right PCM samples. \layout Standard In any given IMA-encoded Quicktime file, the size of an individual block of IMA nibbles is stored in the bytes/packet field present in the extended audio information portion in an audio stsd atom (see the Quicktime documentatio n for more information). However, this size always seems to be 34 bytes/block. Sometimes, IMA-encoded Quicktime files are missing the extended wave informatio n header. In this case, assume that each IMA block is 34 bytes. \layout Standard The first 2 bytes of a block specify a preamble with the initial predictor and step index. The 2 bytes are read from the stream as a big-endian 16-bit number which has the following breakdown: \layout LyX-Code pppppppp piiiiiii \layout Standard Bits 15-7 of the preamble are the top 9 bits of the initial signed predictor; bits 6-0 of the initial predictor are always 0. Bits 6-0 of the preamble specify the initial step index. Note that this gives a range of 0..127 which should be clamped to 0..88 for good measure. \layout Standard The remaining bytes in the IMA block (of which there are usually 32) are the ADPCM nibbles. In Quicktime IMA data, the bottom nibble of a byte is decoded first, then the top nibble: \layout LyX-Code byte0 byte1 byte2 byte3 ... \layout LyX-Code n1n0 n3n2 n5n4 n7n6 ... \layout Standard If a file is encoded as mono IMA, all of the blocks encode that one channel. However, if the file is encoded as stereo IMA, the first block is left audio data, the second block is right audio data, and the stereo interleaving continues on the block level for the duration of the file. \layout Subsubsection Microsoft IMA \layout Standard A Microsoft media file (this includes AVI, ASF, and WAV) that is encoded with IMA ADPCM data has an audio format number of 0x11. The file will have a WAVEFORMAT structure in its header which contains a field named nBlockAlign. This field reveals the size of a block of IMA-encoded data. \layout Standard Note that Microsoft IMA data can also occur in Apple Quicktime files using the fourcc 'ms \backslash x0 \backslash x11'. In this case, a MS WAVEFORMAT header will be attached to the Quicktime file's audio stsd atom. \layout Standard If the IMA data is monaural, an individual chunk of data begins with the following preamble: \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 0-1: initial predictor (in little-endian format) \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 2: initial index \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 3: unknown, usually 0 and is probably reserved \layout Standard The remaining bytes in the chunk are the IMA nibbles. Each byte is decoded bottom nibble first, then top nibble as follows: \layout LyX-Code byte0 byte1 byte2 byte3 ... \layout LyX-Code n1n0 n3n2 n5n4 n7n6 ... \layout Standard If the IMA data is stereo, a chunk begins with two preambles, one for the left audio channel and one for the right channel: \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 0-1: initial predictor (in little-endian format) for left channel \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 2: initial index for left channel \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 3: unknown, usually 0 and is probably reserved \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 4-5: initial predictor (in little-endian format) for right channel \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 6: initial index (for right channel) \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 7: unknown, usually 0 and is probably reserved \layout Standard The remaining bytes in the chunk are the IMA nibbles. The first 4 bytes, or 8 nibbles, belong to the left channel and the next 4 bytes belong to the right channel. This interleaving continues until the end of the chunk: \layout LyX-Code byte0 byte1 byte2 byte3 ... \layout LyX-Code n1n0 n3n2 n5n4 n7n6 ...(left channel) \layout LyX-Code byte4 byte5 byte6 byte7 ... \layout LyX-Code n1n0 n3n2 n5n4 n7n6 ...(right channel) \layout Subsubsection DVI \layout Standard According to the XAnim multimedia application, there are two variants of DVI ADPCM, but of which are encoded and decoded with the IMA ADPCM algorithm. The original variant encodes the top nibble of a byte first, then the bottom nibble. The revised variant has the opposite encoding order. \layout Subsubsection Duck DK4 IMA \layout Standard Some Sega Saturn game CDs contain AVI files which store audio using the Duck DK4 ADPCM algorithm. These AVI files report format 0x61 as their audio codec. DK4 data can be decoded using the same algorithm and tables as are used to decode IMA ADPCM data. The name apparently comes from the fact that 4 ADPCM nibbles decode to 4 16-bit PCM samples, in contrast to Duck's DK3 ADPCM algorithm, in which 3 ADPCM nibbles decode to 4 16-bit PCM samples. \layout Standard It is important to note that WAVE format 0x61 is not officially registered to the Duck Corporation. Official registries of WAVE formats typically list this number as being registered to ESS Technology. \layout Standard The length of a single block of DK4 data is encoded in the nBlockAlign field of an AVI file's WAVEFORMAT header. The chunk encoding format is very similar to MS IMA. If the DK4 data is monaural, an individual chunk of data begins with the following preamble: \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 0-1: initial predictor (in little-endian format) \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 2: initial index \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 3: unknown, usually 0 and is probably reserved \layout Standard The initial predictor is placed directly into the output as the first PCM sample. The remaining bytes in the chunk are the IMA nibbles. Each byte is decoded top nibble first (bits 7-4), then bottom nibble as follows: \layout LyX-Code byte0 byte1 byte2 byte3 ... \layout LyX-Code n0n1 n2n3 n4n5 n6n7 ... \layout Standard If the DK4 data is stereo, a chunk begins with two preambles, one for the left audio channel and one for the right channel: \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 0-1: initial predictor (in little-endian format) for left channel \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 2: initial index for left channel \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 3: unknown, usually 0 and is probably reserved \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 4-5: initial predictor (in little-endian format) for right channel \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 6: initial index (for right channel) \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 7: unknown, usually 0 and is probably reserved \layout Standard The initial left and right channel predictors are placed directly into the output as the first PCM samples for each channel. The remaining bytes in the chunk are the IMA nibbles. For each byte, the top nibble (bits 7-4) corresponds to the left channel and the bottom nibble corresponds to the right channel: \layout LyX-Code byte0 byte1 byte2 byte3 ... \layout LyX-Code L0R0 L1R1 L2R2 L3R3 ... \layout Subsubsection Duck DK3 Joint Stereo IMA \layout Standard Some Sega Saturn game CDs contain AVI files which store audio using the Duck DK3 ADPCM algorithm. These AVI files report format 0x62 as their audio codec. DK3 ADPCM data can be decoded using the same tables as are used to decode IMA ADPCM data while using a slightly modified variant of the IMA ADPCM algorithm. The name DK3 apparently comes from the fact that 3 ADPCM nibbles decode to 4 16-bit PCM samples, in contrast to Duck's DK4 ADPCM algorithm, in which 4 ADPCM nibbles decode to 4 16-bit PCM samples. \layout Standard It is important to note that WAVE format 0x62 is not officially registered to the Duck Corporation. Depending on which version of the audio codec registry is examined, this format will appear as being registered to either Quanta Computer or VoxWare. \layout Standard All multi-byte values are encoded in little-endian format. The length of a single block of DK3 data is encoded in the nBlockAlign field of an AVI file's WAVEFORMAT header. \layout Standard The DK3 algorithm encodes a sum channel and a difference channel, rather than left and right channels, using the standard IMA ADPCM algorithm and tables. Note that the encoding implies that the format only supports stereo data. A block of DK3 has a 16-byte preamble with the following information: \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 0-1 unknown \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 2-3 sample rate \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 4-9 unknown \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 10-11 initial sum channel predictor \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 12-13 initial diff channel predictor \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 14 initial sum channel index \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 15 initial diff channel index \layout Standard After processing the block preamble, a stream of DK3 data is decoded nibble by nibble, just like any ADPCM data. The low nibble is decoded first (bits 3-0), then the high nibble. When decoding the stream, it is useful to conceptualize it as a stream of nibbles: \layout LyX-Code n0 n1 n2 n3 n4 n5 n6 n7 ... \layout Standard where the nibbles were arranged in the original bytestream as: \layout LyX-Code byte0 byte1 byte2 byte3 \layout LyX-Code n1n0 n3n2 n5n4 n7n6 ... \layout Standard Each set of 3 nibbles decodes to 4 16-bit PCM samples using this process (note that the diff value is initialized to the same value as the diff predictor): \layout Itemize get next ADPCM nibble in stream \layout Itemize update sum channel predictor and index using nibble \layout Itemize get next ADPCM nibble in stream \layout Itemize update diff channel predictor and index using nibble \layout Itemize diff value = (diff value + diff predictor) / 2 \layout Itemize next left channel PCM sample = sum channel + diff value \layout Itemize next right channel PCM sample = sum channel - diff value \layout Itemize get next ADPCM nibble in stream \layout Itemize update sum channel predictor and index using nibble \layout Itemize next left channel PCM sample = sum channel + diff value \layout Itemize next right channel PCM sample = sum channel - diff value \layout Subsubsection Westwood Studios IMA \layout Standard Many games published by Westwood Studios use VQA files to transport movie animations and AUD files to transport audio clips. Such titles include the \emph on Command & Conquer \emph default and \emph on Lands of Lore \emph default series. Westwood Studios multimedia files store audio using the standard IMA ADPCM algorithm. \layout Standard VQA is a tagged format with different chunks marked by fourccs. A 'SND2' chunk contains IMA ADPCM nibbles. There is no chunk preamble that specifies initial predictor and index. The predictor and index variables are both initialized to 0 when file playback is started and maintained across chunks. This makes random seeking through Westwood Studios multimedia files quite difficult. \layout Standard If the audio is mono data, the low nibble is decoded first (bits 3-0) then the high nibble: \layout LyX-Code byte0 byte1 byte2 byte3 ... \layout LyX-Code n1n0 n3n2 n5n4 n7n6 ... \layout Standard If the audio is stereo data, left and right bytes are interleaved. Each byte represents 2 samples for either the left of channel: \layout LyX-Code byte0 byte1 byte2 byte3 ... \layout LyX-Code L1L0 R1R0 L3L2 R3R2 ... \layout Subsubsection SDL Motion JPEG IMA \layout Standard SMJPEG stands for SDL Motion JPEG. It is an animation format used by Loki Games for porting computer games (and their full motion video) to Linux. SMJPEG is a chunked file format which uses FOURCCs to identify blocks in the file as well as audio and video codecs. The only known video FOURCC used is 'JFIF' for JPEG. The only known audio FOURCC is 'APCM' for ADPCM. \layout Standard The ADPCM algorithm is standard IMA ADPCM. Compressed audio data comes packaged in 'sndD' chunks. Each chunk is stamped with a millisecond presentation timestamp and a data length, which is usually 0x104 bytes. The first 4 bytes are the initial conditions for decoding the ADPCM block: \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 0-1 initial predictor, big endian format \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 2 initial index \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 3 unused \layout Standard The remainder of the data bytes in the chunk are ADPCM nibbles to be decoded with the standard ADPCM algorithm. The low nibble is decoded first (bits 3-0), then the high nibble. \layout Standard Note that the SMJPEG format description apparently supports stereo. No stereo samples have been encountered at the time of this writing. It is unknown how the format would store stereo data. \layout Subsubsection Dialogic Modified IMA \layout Standard Dialogic ADPCM is a variation of the standard IMA ADPCM algorithm that is optimized for monaural voice data. The encoder operates on 12-bit input samples and outputs 4-bit encoding for each sample. This yields a 3:1 compression ratio. \layout Standard Dialogic ADPCM data is transported in raw files bearing the extension VOX. For each byte in the file, the high nibble (bits 7-4) is decoded first, then the low nibble. \layout Standard The decoding algorithm is the same as the standard IMA ADPCM algorithm with the following modifications: \layout Itemize A different, smaller step table is used (refer to Appendix A for the table). The table contains 49 values ranging from indices 0..48. \layout Itemize The predictor is always initialized to 0 at the start of decoding. \layout Itemize The index is always initialized to 16 at the start of decoding. \layout Itemize When the index is modified by the ADPCM delta, it should be clamped within the 0..48 range, rather than 0..88. \layout Itemize When the diff is applied to the predictor, it should be clamped within a signed 12-bit range (-2048..2047) rather than a signed 16-bit range. \layout Itemize The output samples are 12 bits in resolution and should be scaled as necessary. \layout Standard See Appendix A for the modified IMA tables. \layout Subsubsection 4X IMA \layout Standard Some computer and console games use 4xm multimedia files that are encoded with 4X Technologies' proprietary video codec and either linear PCM audio or a modified IMA ADPCM. The file format also supports multiple audio tracks intended for multi-lingual multimedia files. \layout Standard 4xm is a chunked file format where each chunk is marked with a fourcc. A 4xm file header contains a 'strk' audio header for each audio track in the file. Byte 12 of the 'strk' audio chunk apparently is set to 1 if the track is encoded as ADPCM. \layout Standard All multi-byte numbers are encoded in little-endian format. Each chunk is encoded with a preamble to describe the initial predictors and step indices. If the audio data is monaural, the preamble is laid out as: \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 0-1: initial predictor \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 2-3: initial index \layout Standard The rest of the bytes in the chunk are IMA nibbles. Each byte is decoded bottom nibble first (bits 3-0), then top nibble: \layout LyX-Code byte0 byte1 byte2 byte3 ... \layout LyX-Code n1n0 n3n2 n5n4 n7n6 ... \layout Standard If the audio data is stereo, the initial predictors and step indices are interleaved in the chunk preamble: \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 0-1: initial predictor for left channel \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 2-3: initial predictor for right channel \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 4-5: initial index for left channel \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 6-7: initial index for right channel \layout Standard The first half of the remaining bytes of the chunk are the left IMA nibbles which the second half are the right of the remaining bytes are the right IMA nibbles. For example, if an entire chunk of stereo 4xm audio is 108 bytes (decimal), the first 8 bytes are the preamble, the next 50 bytes are left channel nibbles and the final 50 bytes are right channel nibbles. As in monaural data, bytes are decoded bottom nibble first (bits 3-0), then top nibble. \layout Standard The 4X IMA algorithm is not exactly the same as the standard IMA algorithm. The key difference is in the diff calculation step. In the standard IMA algorithm, diff is calculated as: \layout Standard \begin_inset Formula \[ diff=\frac{(2*nibble+1)*step}{8}=\frac{delta*step}{4}+\frac{step}{8}\] \end_inset \layout Standard In the 4X IMA algorithm, diff is calculated as: \layout Standard \begin_inset Formula \[ diff=\frac{nibble*step+(step/2)}{8}=\frac{delta*step}{8}+\frac{step}{4}\] \end_inset \layout Subsection Microsoft ADPCM \layout Standard In a Microsoft media file (WAV, AVI, or ASF), this audio format is denoted by audio format 0x02. All multi-byte numbers are stored in little endian format. \layout Standard MS ADPCM is organized in blocks. Each block has a preamble and a series of coded ADPCM nibbles. The total number of bytes in an individual ADPCM block is obtained through the nBlockAlign field of a media file's WAV header. \layout Standard Note that Microsoft ADPCM data can also occur in Apple Quicktime files using the fourcc 'ms \backslash x0 \backslash x2'. In this case, a MS WAVEFORMAT header will be attached to the Quicktime file's audio stsd atom. \layout Standard A MS mono ADPCM block begins with the following preamble: \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 0 block predictor (should be in the range [0..6]) \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 1-2 initial idelta \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 3-4 sample 1 \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 5-6 sample 2 \layout Standard The initial idelta and both samples are signed numbers (so take sign extension into account). The block predictor value is used as an index into two adaptation coefficient tables in order to initialize two coefficients, coeff1 and coeff2. \layout Standard The initial 2 samples from the block preamble are sent directly to the output. Sample 2 is first, then sample 1. The remaining samples are decoded from the ADPCM nibbles, which comprise the rest of the bytes in the block. The bytes are decoded from the upper nibble (bits 7-4) first, then the lower nibble. For each nibble: \layout Itemize predictor = ((sample1 * coeff1) + (sample2 * coeff2)) / 256 \layout Itemize predictor += (signed)nibble * idelta (note that nibble is 2's complement) \layout Itemize clamp predictor within signed 16-bit range \layout Itemize PCM sample = predictor send PCM sample to the output \layout Itemize shuffle samples: sample 2 = sample 1, sample 1 = calculated PCM sample \layout Itemize compute next adaptive scale factor: idelta = (AdaptationTable[nibble] * idelta) / 256 \layout Itemize clamp idelta to lower bound of 16 \layout Standard See Appendix A for MS ADPCM reference tables. \layout Standard For stereo data, the block preamble stores interleaved initialization values for the left and right channels: \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 0 left channel block predictor (should be [0..6]) \layout List \labelwidthstring 00.00.0000 byte\SpecialChar ~ 1 right channel block predictor (should be [0..6]) \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 2-3 left channel initial idelta \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 4-5 right channel initial idelta \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 6-7 left channel sample 1 \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 8-9 right channel sample 1 \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 10-11 left channel sample 2 \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 12-13 right channel sample 2 \layout Standard Following the preamble, the left and right ADPCM samples are interleaved within each byte. The upper nibble (bits 7-4) contains the left channel ADPCM code and the lower nibble contains the right channel ADPCM code. \layout Subsection CRI ADX \layout Standard CRI ADX is an ADPCM format primarily used in Sega Dreamcast games. Sometimes it is packaged in custom MPEG-like files along with MPEG video data. Sometimes it is packaged inside audio-only files. The container format specifies the playback frequency of the audio data and whether the audio is monaural or stereo. \layout Standard ADX is organized in blocks of 18 bytes: \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 0-1 scale (encoded as little endian) \layout List \labelwidthstring 00.00.0000 bytes\SpecialChar ~ 2-17 ADPCM nibbles \layout Standard If the audio data is stereo, left blocks and right blocks with the above format are interleaved. \layout Standard Each coded ADX channel has two state variables, sample1 and sample2, which are both initialized to 0 at the start of playback. The 16 data bytes in each ADX block are decoded top nibble (bits 7-4) first, then bottom nibble). For each nibble: \layout Standard \begin_inset Formula \[ sample0=\frac{(BaseVolume*(signed)nibble*scale+0x7298*sample1-0x3350*sample2)}{16384}\] \end_inset \layout Standard \begin_inset Formula \[ sample0=SaturateS16(sample0)\] \end_inset \layout Standard \begin_inset Formula \[ next\, PCM\, sample=sample0\] \end_inset \layout Standard \begin_inset Formula \[ sample2=sample1\] \end_inset \layout Standard \begin_inset Formula \[ sample1=sample0\] \end_inset \layout Standard Notes: \layout Itemize SaturateS16() saturates the sample to a signed 16-bit range (-32768..32767) \layout Itemize BaseVolume = 0x4000 = 16384 \layout Itemize nibble is treated as a 2's complement signed number \layout Itemize 16384 = \begin_inset Formula $2^{14}$ \end_inset , which means that the division operation can be replaced with a right shift by 14 bits \layout Itemize the states of sample1 and sample2 for a particular channel are maintained across blocks; i.e., the values of sample1 and sample2 after decoding the last nibble of block n are used to decode the first nibble of block n+1 \layout Section Other Simple Time Domain Formats \layout Subsection SPC-700 Bit Rate Reduced (BRR) \layout Standard The SPC-700 is a custom Sony audio coprocessor used inside the Super Nintendo Entertainment System. The SPC-700 has its own instruction set and memory space and runs programs uploaded by the main SNES CPU. The SPC-700 manages 8 independent audio channels that play samples from somewhere in the memory space. These channels only play samples that are encoded in a format called bit rate reduced. \layout Standard BRR coding offers close to 4:1 compression. Technically, the ratio is 32:9 as blocks of 16 16-bit samples are packed into 16 4-bit nibbles with a 1-byte preamble for the whole block. Thus, each block is 9 bytes long. \layout Standard The preamble byte has the following bit definitions: \layout List \labelwidthstring 00.00.0000 bits\SpecialChar ~ 7-4: range bits \layout List \labelwidthstring 00.00.0000 bits\SpecialChar ~ 3-2: filter definition \layout List \labelwidthstring 00.00.0000 bit\SpecialChar ~ 1: loop bit \layout List \labelwidthstring 00.00.0000 bit\SpecialChar ~ 0: end bit \layout Standard If the end bit is set, this block is the last of the series of blocks. The loop bit is set in each constituent block of a sample that loops. The filter definition bits allow the sample to be filtered in a variety of ways. For more information on the filter types, consult \begin_inset Quotes eld \end_inset The Bit Rate Reduction Sound Encoding Scheme \begin_inset Quotes erd \end_inset listed in the references. \layout Standard Each of the bytes in the block from offset 1..8 are decoded top nibble first (bits 7-4), then bottom nibble. In order to expand a nibble, shift the sample left by the number of bits specified by the range. Note that the nibble is encoded as a two's complement signed number and must be sign-extended before the shift. Also note that range values from 12..15 would shift some or all of the coded nibble bits out of the final sample and are thus invalid. \layout Section Appendix A: Codec Tables \layout Standard This section lists the tables necessary to encode and decode to and from various audio formats. \layout Subsection mu-law -> linear PCM conversion \layout Standard The following function converts a mu-law byte to a signed 16-bit PCM sample. It can be used to build a 256-entry table of PCM samples for fast table-based mu-law decoding. The function comes from ( \begin_inset LatexCommand \htmlurl{http://www.speech.cs.cmu.edu/comp.speech/Section2/Q2.7.html} \end_inset ) and is credited to Craig Reese of the IDA/Supercomputing Research Center. \layout LyX-Code static int mulaw2linear(unsigned char mulawbyte) { \layout LyX-Code static int exp_lut[8] = {0,132,396,924,1980,4092,8316,16764}; \layout LyX-Code int sign, exponent, mantissa, sample; \layout LyX-Code \layout LyX-Code mulawbyte = ~mulawbyte; \layout LyX-Code sign = (mulawbyte & 0x80); \layout LyX-Code exponent = (mulawbyte >> 4) & 0x07; \layout LyX-Code mantissa = mulawbyte & 0x0F; \layout LyX-Code sample = exp_lut[exponent] + (mantissa << (exponent + 3)); \layout LyX-Code if (sign != 0) sample = -sample; \layout LyX-Code return(sample); \layout LyX-Code } \layout Subsection A-law -> linear PCM conversion \layout Standard The following A-law -> PCM conversion function came from SoX Sound Exchange which in turn came from Sun Microsystems. It can be used to build a 256-entry table of PCM samples for fast A-law decoding. \layout LyX-Code #define SIGN_BIT (0x80) /* Sign bit for a A-law byte. */ \layout LyX-Code #define QUANT_MASK (0xf) /* Quantization field mask. */ \layout LyX-Code #define SEG_SHIFT (4) /* Left shift for segment number. */ \layout LyX-Code #define SEG_MASK (0x70) /* Segment field mask. */ \layout LyX-Code \layout LyX-Code static int alaw2linear(unsigned char a_val) { \layout LyX-Code int t; \layout LyX-Code int seg; \layout LyX-Code \layout LyX-Code a_val ^= 0x55; \layout LyX-Code t = (a_val & QUANT_MASK) << 4; \layout LyX-Code seg = ((unsigned)a_val & SEG_MASK) >> SEG_SHIFT; \layout LyX-Code switch (seg) { \layout LyX-Code case 0: \layout LyX-Code t += 8; \layout LyX-Code break; \layout LyX-Code \layout LyX-Code case 1: \layout LyX-Code t += 0x108; \layout LyX-Code break; \layout LyX-Code \layout LyX-Code default: \layout LyX-Code t += 0x108; \layout LyX-Code t <<= seg - 1; \layout LyX-Code } \layout LyX-Code \layout LyX-Code return ((a_val & SIGN_BIT) ? t : -t); \layout LyX-Code } \layout Subsection Interplay DPCM delta table \layout Standard This is the table of 256 deltas to use in decoding Interplay DPCM data: \layout LyX-Code int interplay_delta_table[] = { \layout LyX-Code 0, 1, 2, 3, 4, 5, 6, 7, \layout LyX-Code 8, 9, 10, 11, 12, 13, 14, 15, \layout LyX-Code 16, 17, 18, 19, 20, 21, 22, 23, \layout LyX-Code 24, 25, 26, 27, 28, 29, 30, 31, \layout LyX-Code 32, 33, 34, 35, 36, 37, 38, 39, \layout LyX-Code 40, 41, 42, 43, 47, 51, 56, 61, \layout LyX-Code 66, 72, 79, 86, 94, 102, 112, 122, \layout LyX-Code 133, 145, 158, 173, 189, 206, 225, 245, \layout LyX-Code 267, 292, 318, 348, 379, 414, 452, 493, \layout LyX-Code 538, 587, 640, 699, 763, 832, 908, 991, \layout LyX-Code 1081, 1180, 1288, 1405, 1534, 1673, 1826, 1993, \layout LyX-Code 2175, 2373, 2590, 2826, 3084, 3365, 3672, 4008, \layout LyX-Code 4373, 4772, 5208, 5683, 6202, 6767, 7385, 8059, \layout LyX-Code 8794, 9597, 10472, 11428, 12471, 13609, 14851, 16206, \layout LyX-Code 17685, 19298, 21060, 22981, 25078, 27367, 29864, 32589, \layout LyX-Code -29973, -26728, -23186, -19322, -15105, -10503, -5481, -1, \layout LyX-Code 1, 1, 5481, 10503, 15105, 19322, 23186, 26728, \layout LyX-Code 29973, -32589, -29864, -27367, -25078, -22981, -21060, -19298, \layout LyX-Code -17685, -16206, -14851, -13609, -12471, -11428, -10472, -9597, \layout LyX-Code -8794, -8059, -7385, -6767, -6202, -5683, -5208, -4772, \layout LyX-Code -4373, -4008, -3672, -3365, -3084, -2826, -2590, -2373, \layout LyX-Code -2175, -1993, -1826, -1673, -1534, -1405, -1288, -1180, \layout LyX-Code -1081, -991, -908, -832, -763, -699, -640, -587, \layout LyX-Code -538, -493, -452, -414, -379, -348, -318, -292, \layout LyX-Code -267, -245, -225, -206, -189, -173, -158, -145, \layout LyX-Code -133, -122, -112, -102, -94, -86, -79, -72, \layout LyX-Code -66, -61, -56, -51, -47, -43, -42, -41, \layout LyX-Code -40, -39, -38, -37, -36, -35, -34, -33, \layout LyX-Code -32, -31, -30, -29, -28, -27, -26, -25, \layout LyX-Code -24, -23, -22, -21, -20, -19, -18, -17, \layout LyX-Code -16, -15, -14, -13, -12, -11, -10, -9, \layout LyX-Code -8, -7, -6, -5, -4, -3, -2, -1 \layout LyX-Code }; \layout Subsection Standard IMA tables \layout Standard The tables step_table[] and index_table[] are from the ADPCM reference source listed in the references. They are used to decode most variants of IMA ADPCM data. \layout LyX-Code int index_table[16] = { \layout LyX-Code -1, -1, -1, -1, 2, 4, 6, 8, \layout LyX-Code -1, -1, -1, -1, 2, 4, 6, 8 \layout LyX-Code }; \layout Standard Note that many programs use slight deviations from the following table, but such deviations are negligible: \layout LyX-Code int step_table[89] = { \layout LyX-Code 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, \layout LyX-Code 19, 21, 23, 25, 28, 31, 34, 37, 41, 45, \layout LyX-Code 50, 55, 60, 66, 73, 80, 88, 97, 107, 118, \layout LyX-Code 130, 143, 157, 173, 190, 209, 230, 253, 279, 307, \layout LyX-Code 337, 371, 408, 449, 494, 544, 598, 658, 724, 796, \layout LyX-Code 876, 963, 1060, 1166, 1282, 1411, 1552, 1707, 1878, 2066, \layout LyX-Code 2272, 2499, 2749, 3024, 3327, 3660, 4026, 4428, 4871, 5358, \layout LyX-Code 5894, 6484, 7132, 7845, 8630, 9493, 10442, 11487, 12635, 13899, \layout LyX-Code 15289, 16818, 18500, 20350, 22385, 24623, 27086, 29794, 32767 \layout LyX-Code }; \layout Subsection Dialogic modified IMA tables \layout Standard The following table is the modified (and abbreviated) step table used to decode Dialogic ADPCM data. This table comes from the Dialogic ADPCM document listed in the references: \layout LyX-Code int dialogic_ima_step[49] = { \layout LyX-Code 16, 17, 19, 21, 23, 25, 28, 31, 34, 37, 41, 45, \layout LyX-Code 50, 55, 60, 66, 73, 80, 88, 97, 107, 118, 130, 143, \layout LyX-Code 157, 173, 190, 209, 230, 253, 279, 307, 337, 371, 408, 449, \layout LyX-Code 494, 544, 598, 658, 724, 796, 876, 963, 1060, 1166, 1282, 1411, 1552 \layout LyX-Code }; \layout Subsection MS ADPCM tables \layout Standard The following tables come from libsndfile. They are used to decode Microsoft ADPCM data: \layout LyX-Code int AdaptationTable [] = { \layout LyX-Code 230, 230, 230, 230, 307, 409, 512, 614, \layout LyX-Code 768, 614, 512, 409, 307, 230, 230, 230 \layout LyX-Code } ; \layout LyX-Code int AdaptCoeff1 [] = { 256, 512, 0, 192, 240, 460, 392 } ; \layout LyX-Code int AdaptCoeff2 [] = { 0, -256, 0, 64, 0, -208, -232 } ; \layout Section References \layout Standard These are some of the sources examined during the creation of this document: \layout Itemize ADPCM reference implementation: \begin_inset LatexCommand \htmlurl{ftp://ftp.cwi.nl/pub/audio/adpcm.tar.gz} \end_inset \layout Itemize XAnim: \begin_inset LatexCommand \htmlurl{http://smurfland.cit.buffalo.edu/xanim/home.html} \end_inset \layout Itemize QuickTime4Linux: \begin_inset LatexCommand \htmlurl{http://heroinewarrior.com} \end_inset \layout Itemize libsndfile: \begin_inset LatexCommand \htmlurl{http://www.zip.com.au/~erikd/libsndfile/} \end_inset \layout Itemize duck.exe Truemotion (and ADPCM) player: \begin_inset LatexCommand \htmlurl{http://www.din.or.jp/~ch3/index_e.html} \end_inset \layout Itemize Apple Developer Connection Technical Note TN1081: Understanding The Differences Between Apple And Windows IMA-ADPCM Compressed Sound Files: \begin_inset LatexCommand \url{http://developer.apple.com/technotes/tn/tn1081.html} \end_inset \layout Itemize Command & Conquer Technical Page (for Westwood Studios ADPCM): \begin_inset LatexCommand \htmlurl{http://www.geocities.com/SiliconValley/8682/cnc.html} \end_inset \layout Itemize SMJPEG Library: \begin_inset LatexCommand \htmlurl{http://www.lokigames.com/development/smjpeg.php3} \end_inset \layout Itemize Dialogic ADPCM Algorithm: \begin_inset LatexCommand \htmlurl{http://resource.intel.com/telecom/support/appnotes/adpcm.pdf} \end_inset \layout Itemize The Bit Rate Reduction Sound Encoding Scheme, as interpreted by Butcha: \begin_inset LatexCommand \htmlurl{http://sgc.jandar.net/files/docs/brr.txt} \end_inset \layout Itemize Id Software's .RoQ Video File Format: \begin_inset LatexCommand \htmlurl{http://www.csse.monash.edu.au/~timf/videocodec/idroq.txt} \end_inset \layout Itemize Interplay MVE File Format: \begin_inset LatexCommand \htmlurl{http://www.pcisys.net/~melanson/codecs/interplay-mve.txt} \end_inset \layout Itemize CRI ADX tools: \begin_inset LatexCommand \htmlurl{http://www.geocities.co.jp/Playtown/2004/dcdev/} \end_inset \layout Itemize How do I convert to/from mu-law format? \begin_inset LatexCommand \htmlurl{http://www.speech.cs.cmu.edu/comp.speech/Section2/Q2.7.html} \end_inset \layout Itemize SoX Sound eXchange: \begin_inset LatexCommand \htmlurl{http://sox.sourceforge.net} \end_inset \layout Itemize xine Multimedia Player: \begin_inset LatexCommand \htmlurl{http://www.xinehq.de} \end_inset \layout Section Acknowledgements \layout Itemize Keiki Satoh (kki at wakusei.ne.jp), for finding the bug relating to outputting the initial samples in an MS ADPCM block. \layout Itemize Michael Niedermayer (michaelni at gmx.at), for the format layout of 4X IMA ADPCM. \layout Itemize Mario Brito (mbrito at student.dei.uc.pt), for the Xan DPCM algorithm. \layout Section Changelog \layout Itemize v1.1: December 3, 2003 \newline - fixed code fragments that had a bit shift operater that vanished when LyX was converted to HTML \newline - added Interplay DPCM algorithm \newline - added Xan DPCM algorithm \newline - fixed description of 4X IMA ADPCM \newline - specified BaseVolume in CRI ADX algorithm \layout Itemize v1.0: June 8, 2003 \newline - initial release \layout Section GNU Free Documentation License \layout Standard Please see gnu.org's GFDL page: \begin_inset LatexCommand \htmlurl{http://www.gnu.org/licenses/fdl.html} \end_inset . \the_end