AVI Files: Tips & Quirks by Arpad (A'rpi) Gereoffy editing and introduction by Mike Melanson (mike at multimedia.cx) v1.1: March 13, 2003 Copyright (c) 2002-2003 Arpad Gereoffy Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". Introduction ------------ AVI stands for Audio/Video Interleaved. For a long time, it was the de facto standard for multimedia files on Windows (recently, ASF has supplanted AVI on the Windows platform). While there is some contention regarding the originator of the format, the fact remains that there were, and still are, a wide variety of computer applications that create AVI files. This leads to a lot of fragmentation and application-specific nuances in a standard that was never particularly well-defined in the first place. A'rpi is the originator of the MPlayer media application for Linux. It's an open source movie player that can decode AVI files, as well as a number of other file formats. He has encountered a lot of AVIs created by a lot of different programs and is qualified to write about some of the quirks and nuances a programmer might encounter when writing a general purpose AVI file decoder. Random AVI Tips From A'rpi -------------------------- In short, these are some things I discovered while writing/fixing my AVI demuxer: - AVI files are built from variable length chunks. - Each chunk has a 4-byte fourcc and a 4-byte length (dword). - If the chunk size is bad/broken, it will kill the whole demuxer process. - Chunks are padded to 2*n offset. AVI files usually have: - RIFF avi header, containing general parameters (used for file type detection) - stream headers, containing common format stream descriptor, and type-specific audio/video/other header - single 'movi' chunk contains the audio and video packets. - index chunk contains index table (16 bytes for each chunk in 'movi') The AVI header has a dwFlags field. It contains useful information, like type of interleaving, "have index" chunk and so on. Ignore it. Really. It's broken in too many files. Windows players ignore it too. AVI docs say that 'XXdb' are uncompressed and 'XXdc' are compressed video chunk fourccs. (XX = stream id in HEX. Some specs says it's in DEC. Funny.) Ignore it. Just use the first 2 chars as a hex number, and get stream type from stream header for that stream id. I've seen even XXim fourccs... Stream header has some interesting fields: - dwRate, dwScale: These specify the playback samplerate of the stream. - dwStart: Specifies delay of the stream; rarely used, but must be supported. - dwSampleSize: This is the sample size (bytes / sample). It may be 0, which means variable sample sizes -> 1 chunk == 1 sample. For non-zero samplesize, chunks may contain more than one sample. Regarding VBR audio in AVI, see VirtualDub site mentioned in the references. The 3 AVI parsers in Windows behave differently with such streams. 1 normal (0=vbr), 1 tricky (rounds up zero to blockalign), 1 crashes. In AVI, audio specific header contains WAVEFORMATEX and video spec. hdr contains BITMAPINFOHEADER. Both can have optional codec-dependent extra data appended after the struct. Don't crop it, it will break decoding! About the movi chunk: Recently I got some AVI files with bad movi chunk sizes. So, I have to say: Ignore it. Read chunks from the file while not EOF, and not while filepos < movi_end. About index: - It contains chunk pos, chunk size, chunk fourcc and flags. Bit 4 of the flags field (flags & 0x10) means that chunk represents a keyframe. - Offset is relative to `cat /dev/urandom`. Really. Or dunno. - I calculate an offset_of_offset value from the movi_start and first chunk offset. It works in 99% of cases. I saw different methods in other players, handling some common cases (such as relative to avi chunk, relative to movie chunk, etc.) and fallback to absolute value. - Chunk info in chunk header (first 4+4 bytes of chunks) and index table should be equal. They aren't. Sometimes the size values differ by +/-1. Strange. Sometimes fourccs 'type' part (last 2 char) differ. Even more strange. Sometimes they leave chunk header. I think Windows parsers don't use chunk headers at all, and use only the index. This may be why they are unable to play files without index. On the subject of interleaving, there are 3 categories of interleaving for AVI files (taken from DOCS/tech/formats.txt in the MPlayer distribution): 1) Interleaved: Audio and video content is interleaved. It's faster and requires only 1 reading thread, so it's recommended (and most commonly used). 2) Non-interleaved: Audio and video aren't interleaved. The file stores all of the video data followed by all the audio data. Such a file requires 2 reading processes or 1 reading with lots of seeking. This is very bad when playing the data from a network or CD-ROM. 3) Badly-interleaved streams: Some AVI files claim to be interleaved but with bad sync. These files should be treated as non-interleaved. About A/V sync, you should rely on samplerate (dwRate/dwScale), samplesize and stream positions. Use an integer, not a floating-point number, for byte positions. When calculating time for each frame: time = ((dwSampleSize?(bytepos/dwSampleSize):chunkpos)*dwRate/dwScale floats will gradually drift into error. References ---------- The MPlayer Media Application http://www.mplayerhq.hu/ John McGowan's AVI Overview http://www.jmcgowan.com/avi.html VirtualDub http://www.virtualdub.org/ ChangeLog --------- v1.1: March 13, 2003 - licensed under GNU Free Documentation License - minor cosmetic changes v1.0: February 9, 2002 - initial release GNU Free Documentation License ------------------------------ see http://www.gnu.org/licenses/fdl.html