Multimedia APIs by Mike Melanson (mike at multimedia.cx) v1.1: July 10, 2005 ======================================================================= NOTE: The information in this document is now maintained in Wiki format at: http://wiki.multimedia.cx/index.php?title=Category:Multimedia_APIs ======================================================================= Copyright (c) 2004-2005 Mike Melanson Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". Contents -------- * Introduction * Rad Game Tools Smacker API * Rad Game Tools Bink API * Linspire/Microsoft Video Binary API * Linspire/Microsoft Audio Binary API * References * Changelog * GNU Free Documentation License Introduction ------------ Many multimedia applications boast modular extensibility as a core feature in order to add support for new file formats, video and audio codecs, etc. Such an application needs to have an application programmer's interface, or API. This allows a third party to create a binary module that the primary application can open and use. This document explains the details of various multimedia application APIs, with a primary focus on what steps are necessary to decode data. Rad Game Tools Smacker API -------------------------- Some games that use Rad Game Tools Smacker files for their multimedia files distribute a handy Win32 DLL called smackw32.dll. According to the Media Player Classic (MPC) application (see references), the various Smacker functions pass around a structure with the following layout (all multi-byte numbers are native endian): bytes 0-3 version bytes 4-7 video width bytes 8-11 video height bytes 12-15 frame count bytes 16-19 mspf (?) bytes 20-883 unknown (probably includes a palette) bytes 884-887 current frame number This document will refer to this structure as the SmackStruct. MPC is interested in the following functions exported by the Smacker DLL: * SmackSoundUseDirectSound * SmackOpen * SmackGoto * SmackDoFrame * SmackToBuffer * SmackNextFrame * SmackClose The API description for each function follows: int SmackSoundUseDirectSound(IDirectSound *pDS); SmackSoundUseDirectSound initializes the audio playback system to allow the Smacker DLL to output decoded audio data straight out through DirectSound. pDS: A pointer to an IDirectSound structure. Returns: 0 if the function failed, non-zero on success. SmackStruct *SmackOpen(HANDLE *SmackFile, UINT32 flags, INT32 unknown); SmackOpen opens and initializes a Smacker file for playback. SmackFile: A Windows file HANDLE that refers to the Smacker file to be read. Flags: The meaning of all the flags is unclear, but MPC calls SmackOpen with 0xff400 in this parameter. unknown: The meaning of this parameter is unclear, but MPC calls SmackOpen with -1 in this parameter. Returns: A pointer to a SmackStruct that will be used for playing the Smacker file. void SmackGoto(SmackStruct *Smack, UINT32 FrameNumber); SmackGoto signals the playback engine to reposition the Smacker file to a requested frame. Smack: A pointer to the SmackStruct returned by SmackOpen. FrameNumber: The frame number where the stream should be positioned to. Returns: No known return value as MPC does not check for one. int SmackDoFrame(SmackStruct *Smack); SmackDoFrame processes the next frame in the Smack file. Smack: A pointer to the SmackStruct returned by SmackOpen. Returns: MPC's code comments indicate that this function does return a value, but the meaning is not specified. void SmackToBuffer(SmackStruct *Smack, uint32 Unknown1, uint32 Unknown2, uint32 Stride, uint32 FrameHeightInPixels, BYTE *OutBuffer, uint32 Flags); SmackToBuffer decodes a frame of video into the specified memory buffer. Smack: A pointer to the SmackStruct returned by SmackOpen. Unknown1, Unknown2: Unknown, but MPC sets these both to 0. Stride: The width of a single line in the output buffer, in bytes. If the output colorspace is 16-bit RGB, this field is pixel width * 2. Height: The pixel height of the frame in the output buffer. OutBuffer: The memory buffer where the decoded frame is to be output. This buffer needs to be at least Stride * Height bytes large. Flags: A series of 32 bits specifying the mode of operation. MPC sets the top 2 bits to 1 (0xc0000000) which apparently outputs 16-bit RGB data. Returns: No known return value as MPC does not check for one. void SmackNextFrame(SmackStruct *Smack); SmackNextFrame signals the playback engine to advance the Smacker file to the next frame. Smack: A pointer to the SmackStruct returned by SmackOpen. Returns: No known return value as MPC does not check for one. void SmackClose(SmackStruct *Smack); SmackClose gracefully closes a Smacker file and releases any allocated resources. Smack: A pointer to the SmackStruct returned by SmackOpen. Returns: No known return value as MPC does not check for one. Using the Smacker DLL to decode a Smacker file operates using the following steps: 1) Set up the sound system using SmackSoundUseDirectSound. 2) Call SmackOpen with a handle to the file. The function will allocate and return a SmackStruct data structure which will be used through the reset of the Smacker functions to manipulate and decode the file. This structure contains parameters regarding the file's video properties, such as width and height. 3) Call SmackDoFrame to decode the next frame of video. 4) Call SmackToBuffer to fetch the decoded frame into your own application's buffer. 5) Call SmackNextFrame to advance to the next frame in the file. 6) Repeat from step 3 while there are frames remaining in the file (the number of frames is specified in the SmackStruct). 7) Call SmackClose to deallocate the resources used for decoding the Smacker file. 8) Also, SmackGoto can be called to reposition the file during playback. Rad Game Tools Bink API ----------------------- Some games that use Rad Game Tools Bink files for their multimedia files distribute a handy Win32 DLL called binkw32.dll. According to the Media Player Classic (MPC) application (see references), the various Bink file manipulation functions pass around a structure with the following layout (all multi-byte numbers are native endian; pointers are 32-bit memory addresses): bytes 0-3 video width bytes 4-7 video height bytes 8-11 frame count bytes 12-15 current frame bytes 16-19 last frame bytes 20-23 frames/second multiplier bytes 24-27 frames/second divisor bytes 28-31 unknown bytes 32-35 flags bytes 36-295 unknown bytes 296-299 current plane bytes 300-303 pointer to plane 0 bytes 304-307 pointer to plane 1 bytes 308-315 unknown bytes 316-319 Y plane width bytes 320-323 Y plane height bytes 324-327 U&V plane width bytes 328-331 U&V plane height This document will refer to this structure as the BinkStruct. MPC is interested in the following functions exported by the Bink DLL: * BinkSetSoundSystem * BinkOpenDirectSound * BinkOpen * BinkGoto * BinkDoFrame * BinkNextFrame * BinkClose The API description for each function follows: int BinkSetSoundSystem(SOUND_FUNC SoundFunction, IDirectSound *pDS); BinkSetSoundSystem initializes the audio playback subsystem. SoundFunction: This appears to be the function that will be invoked in order to playback the audio. MPC passes in BinkOpenDirectSound as the parameter. BinkOpenDirectSound must meet the qualifications to be a SOUND_FUNC (contrived for this description). pDS: A pointer to an IDirectSound structure. Returns: 0 if the function failed, non-zero on success. BinkStruct *BinkOpen(HANDLE BinkFile, UINT32 Flags); BinkOpen opens and initializes a Bink file for playback. BinkFile: A Windows file HANDLE that refers to the Bink file to be read. Flags: The meaning of all the flags is unclear, but MPC calls BinkOpen with 0x00800000. Returns: A pointer to a BinkStruct that will be used for playing the Bink file. void BinkGoto(BinkStruct *Bink, UINT32 FrameNumber, UINT32 unknown); BinkGoto signals the playback engine to reposition the Bink file to a requested frame. Bink: A pointer to the BinkStruct returned by BinkOpen. FrameNumber: The frame number where the stream should be positioned to. unknown: MPC sets this parameter to 0. Returns: No known return value as MPC does not check for one. int BinkDoFrame(BinkStruct *Bink); BinkDoFrame processes the next frame in the Bink file. Bink: A pointer to the BinkStruct returned by BinkOpen. Returns: MPC's code comments indicate that this function does return a value, but the meaning is not specified. void BinkNextFrame(BinkStruct *Bink); BinkNextFrame signals the playback engine to advance the Bink file to the next frame. Bink: A pointer to the BinkStruct returned by BinkOpen. Returns: No known return value as MPC does not check for one. void BinkClose(BinkStruct *Bink); BinkClose gracefully closes a Bink file and releases any allocated resources. Bink: A pointer to the BinkStruct returned by BinkOpen. Returns: No known return value as MPC does not check for one. Using the Bink DLL to decode a Bink file operates using the following steps: 1) Set up the sound system using BinkSetSoundSystem. The Bink DLL provides the BinkOpenDirectSound for audio playback and handles the audio itself. 2) Call BinkOpen with a handle to the file. The function will allocate and return a BinkStruct data structure. This structure contains parameters regarding the file's video properties, such as width and height. There are pointers to two different planes. The Bink DLL uses a double-buffering scheme when decoding video, and decodes to YUV 4:2:0 data (alias YV12, YUV420P). After decoding a frame of video, one of the two planes will contain a pointer to a buffer that contains all of the Y data, all of the U data, and all of the V data, all back to back. The current plane is indicated in the BinkStruct. The BinkStruct also provides the dimensions of the Y plane and the U&V planes so that the data can be properly sorted out. The data in the buffer is ordered YUV unless bits 15 and 16 in the BinkStruct's flags are set to 1 (BinkStruct.flags & 0x00018000), in which case, the data is ordered YVU. 3) Call BinkDoFrame to decode the next frame of video. Fetch the video from the BinkStruct and display, convert, manipulate as an application sees fit. 4) Call BinkNextFrame to advance to the next frame in the file. 5) Repeat from step 3 while there are frames remaining in the file (the number of frames is specified in the BinkStruct). 6) Call BinkClose to deallocate the resources used for decoding the Bink file. 7) Also, BinkGoto can be called to reposition the file during playback. Linspire/Microsoft Video Binary API ----------------------------------- Linspire is a Linux distribution that aims to rival Microsoft in desktop ease of use for the end-user. To that end, they have licensed certain of Microsoft's audio and video codecs that are not yet supported in open source. The decoders for these codecs link into FFmpeg's libavcodec library. The video codecs supported by this method are Microsoft's Windows Media Video versions 8 and 9 (henceforth WMV8 and WMV9, respectively). Each binary module has an initialization function, packet decoding function, and cleanup function. The initialization function has the following declaration: int init(void *private_data, unsigned char *extradata, int extradata_size, int codec_tag, int width, int height); private_data: A private data structure used to store the internal decoder state. The libavcodec glue code declares this as an array of 1024 bytes. extradata: The extradata bytes at the end of the BITMAPINFO header from the Microsoft media file (AVI or ASF). extradata_size: The size of the buffer indicated by extradata. codec_tag: The 32-bit four-character code (FOURCC) of the codec data. width: The width in pixels of the video frames to be decoded. height: The height in pixels of the video frames to be decoded. The initialization function returns 0 on success, non-zero if there was a problem. The packet decoding functions have the following declaration: int packet_decode(void *private_data, void *data, int *data_size, unsigned char *buf, int buf_size, Bits *bit_structure, unsigned char *work_buffer); private_data: A private data structure used to store the internal decoder state. The libavcodec glue code declares this as an array of 1024 bytes. data: The void *data parameter passed into the libavcodec decoding function. data_size: The int data_size parameter passed into the libavcodec decoding function. buf: The encoded bytestream passed in from libavcodec. buf_size: The size of the encoded bytestream. bit_structure: A structure of type Bits that has the following declaration: typedef struct { unsigned char *data[4]; int linesize[4]; } Bits; The data and linesize fields are the size as those in a libavcodec AVFrame structure. work_buffer: A buffer provided by the libavcodec and allocated to be (width * height * 4) bytes large. The packet_decode function returns a negative number in the event of an error. Whatever integer is returned can be returned from the libavcodec decode function as part of the normal libavcodec API. The cleanup (end) function has the following declaration: int end(void *private_data); private_data: A private data structure used to store the internal decoder state. The libavcodec glue code declares this as an array of 1024 bytes. The end function returns an integer that can be returned from the libavcodec end function as part of the normal libavcodec API. Linspire/Microsoft Audio Binary API ----------------------------------- As with certain video codecs, Linspire also licensed certain audio codecs from Microsoft in order to make multimedia more accessible to the desktop end-user. The binary decoder modules distributed with Linspire are Windows Media Audio 2 and 3. As of this writing, only Windows Media Audio 3 (WMA3) is called by the libavcodec glue code. However, both "Pro" and "Lossless" variants are supported with ths module. The binary decoder for the WMA3 codec links into FFmpeg's libavcodec. The binary module has three interface functions: wma3init, wma3packet, and wma3end. The wma3init function has the following declaration: int wma3init(void *private_data, unsigned char *extradata, long extradata_size, long sample_rate, long block_size, int wma3pro); private_data: A private data structure used to store the internal decoder state. The libavcodec glue code declares this as an array of 1024 bytes. extradata: The extradata bytes at the end of the WAVEFORMATEX header from the Microsoft media file (AVI or ASF). extradata_size: The size of the buffer indicated by extradata. sample_rate: The sampling frequency of the decompressed PCM audio. block_size: The length of an individual block of encoded audio data. wma3pro: This flag is set to 1 if initializing the decoder for decoding WMA3 "Pro" data. The flag is 0 if initializing the decoder for decoding WMA3 "Lossless" data. The wma3packet function has the following declaration: int wma3packet(void *private_data, void *data, int *data_size, unsigned char *buf, int buf_size); private_data: A private data structure used to store the internal decoder state. The libavcodec glue code declares this as an array of 1024 bytes. data: The void *data parameter passed into the libavcodec decoding function. data_size: The int data_size parameter passed into the libavcodec decoding function. buf: The encoded bytestream passed in from libavcodec. buf_size: The size of the encoded bytestream. The packet function returns an integer that can be returned from the libavcodec decode_packet function as part of the normal libavcodec API. The wma3end function has the following declaration: int wma3end(void *private_data); private_data: A private data structure used to store the internal decoder state. The libavcodec glue code declares this as an array of 1024 bytes. The wma3end function returns an integer that can be returned from the libavcodec end function as part of the normal libavcodec API. References ---------- Media Player Classic: http://sourceforge.net/projects/guliverkli/ Linspire: http://www.linspire.com/ FFmpeg/libavcodec: http://ffmpeg.sourceforge.net/ Changelog --------- v1.1: July 10, 2005 - Linspire/Microsoft Video Binary API - Linspire/Microsoft Audio Binary API v1.0: February 4, 2004 - initial release, covering the Rad Game Tools Smacker and Bink DLLs GNU Free Documentation License ------------------------------ see http://www.gnu.org/licenses/fdl.html