Description of the Nullsoft Video (NSV) Format by Mike Melanson (mike at multimedia.cx) v1.0: May 19, 2003 [July 10, 2005: Note: You no longer have to struggle with these incomplete specifications. The formal NSV specs are available here: http://multimedia.cx/mirror/NSVFormat.rtf ] Copyright (c) 2003 Mike Melanson Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". Contents -------- * Introduction * File Format * References * Acknowledgements * Changelog * GNU Free Documentation License Introduction ------------ Nullsoft, the entity behind the ubiquitous Winamp MP3 / general-purpose multimedia player, offers a multimedia container format designed with network streaming in mind. The format is called Nullsoft Video and bears the file extension '.nsv'. File Format ----------- All multi-byte numbers are in little endian format. A NSV file has the following overall structure: 'NSVf' optional metadata chunk 'NSVs' audio/video data chunk [NSVs chunk] .. [NSVs chunk] NSV files may start with an optional info and index chunk that is marked by the characters 'NSVf'. The chunk has the following layout: bytes 0-3 'NSVf' signature bytes 4-7 size of chunk, including signature and size fields bytes 8-11 total size of file bytes 12-19 unknown bytes 20-23 number of table entries bytes 24-27 number of table entries (again?) [arbitrary length info string] .. [arbitrary length info string] [data table] Following the first 28 bytes of the info block is any number of arbitrary length info strings. The strings take the format of: STRING=`value` The string is delimited by the backtick (`) character, a.k.a. ASCII 0x60. Known strings include TITLE and ASPECT for the NSV file's title and aspect ratio, respectively. Examples: TITLE=`deer video` ASPECT=`1.125` Following the metadata strings is a table of incrementing 32-bit numbers. The number of entries in this table is specified in the NSVf chunk header, apparently twice. The meaning of these numbers is unclear. The meat of a NSV file (encoded audio and video chunks) is stored in a series of NSVs data chunks. Each NSVs chunk can contain multiple video and/or audio chunks. A NSVs chunk has the following header: bytes 0-3 'NSVs' signature bytes 4-7 video codec fourcc bytes 8-11 audio codec fourcc bytes 12-13 video width, divisible by 16 bytes 14-15 video height, divisible by 16 byte 16 framerate bit 7 1 = lower 7 bits indicate a standard fractional framerate 0 = lower 7 bits indicate an absolute framerate bits 6-0 framerate bytes 17-18 unknown If a file does not have audio or video, the corresponding codec fourcc will be 'NONE'. Common video fourccs are 'VP31' and 'VP3 ' which indicate On2 VP3 video. Common audio fourccs are 'MP3 ' for MPEG layer III audio and 'PCM ' for raw PCM audio. The MSB of byte 16 appears to indicate that the lower 7 bits represent a standard fractional framerate. For example, 0x81 equates to 29.97 fps, 0x85 equates to 14.98 fps, while 0x0F simply represents 15 fps. After the NSVs header are 5 bytes which provide the following length information: v? vv vv aa aa The lower nibble of byte 0 is unknown. The upper nibble of byte 0, along with bytes 1 and 2 comprise the length of the video data in bytes. Since there are 5 hex characters to describe the length, the maximum video chunk size is 2^20 = 1 megabyte. Bytes 3-4 are the 16-bit length of the audio chunk. Consider this example: 80 B7 00 D1 00 The first 3 bytes, 80 B7 00, are rearranged in little endian form as 0x00B780. Then the number is shifted right by 4 to give a video chunk length of 0xB78 bytes. The audio chunk length bytes are D1 00, or 0x00D1 in little endian. After the first video/audio chunk pair in a NSV file, there will be a BEEF marker before the next pair. That is, the hex number 0xBEEF encoded in little endian (EF BE). After the marker is another 5 bytes encoding the video and audio chunk lengths as described above, followed by another frame of video and audio data. This BEEF-length-data pattern continues until the end of the NSVs chunk. A small note on PCM audio: If the audio data is encoded with fourcc 'PCM ', each audio data chunk will contain the following 4-byte header: byte 0 unknown byte 1 number of channels bytes 2-3 sample rate References ---------- NSV website, home to samples and SDK: http://www.nullsoft.com/nsv/ Acknowledgements ---------------- Thanks to Roberto Togni (rtogni at bresciaonline dot it) and Arpad "A'rpi" Gereoffy (arpi at mplayerhq dot hu) for further investigation into the format. Changelog --------- v1.0: May 19, 2003 - sorted out NSVs chunk formatting - document promoted to 1.0 status since enough information has been uncovered to create functional demuxers v0.2: March 13, 2003 - licensed under GNU Free Documentation License - expanded information regarding NSVs data chunks v0.1: February 11, 2003 - initial release GNU Free Documentation License ------------------------------ see http://www.gnu.org/licenses/fdl.html