Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


Archives:

Survey of CD Image Formats

April 29th, 2013 by Multimedia Mike

In the course of exploring and analyzing the impressive library of CD images curated at the Internet Archive’s Shareware CD collection, one encounters a wealth of methods for copying a complete CD image onto other media for transport. In researching the formats, I have found that many of them are native to various binary, proprietary CD programs that operate under Windows. Since I have an interest in interpreting these image formats and I would also like to do so outside of Windows, I thought to conduct a survey to determine if enough information exists to write processing tools of my own.

Remember from my Grand Unified Theory of Compact Disc that CDs, from a high enough level of software abstraction, are just strings of 2352-byte sectors broken up into tracks. The difference among various types of CDs comes down to the specific meaning of these 2352 bytes.

Most imaging formats rip these strings of sectors into a giant file and then record some metadata information about the tracks and sectors.

ISO
This is perhaps the most common method for storing CD images. It’s generally only applicable to data CD-ROMs. File images generally end with a .iso extension. This refers to ISO-9660 which is the standard CD filesystem.

Sometimes, disc images ripped from other types of discs (like Xbox/360 or GameCube discs) bear the extension .iso, which is a bit of a misnomer since they aren’t formatted using the ISO-9660 filesystem. But the extension sort of stuck.

BIN / CUE
I see the BIN & CUE file format combination quite frequently. Reportedly, a program named CDRWIN deployed this format first. This format can handle a mixed mode CD (e.g., starts with a data track and is followed by a series of audio tracks), whereas ISO can only handle the data track. The BIN file contains the raw data while the CUE file is a text file that defines how the BIN file is formatted (how many bytes in a sector, how many sectors to each individual track).

CDI
This originates from a program called DiscJuggler. This is extremely prevalent in the Sega Dreamcast hobbyist community for some reason. I studied the raw hex dumps of some sample CDI files but there was no obvious data (mostly 0s). There is an open source utility called cdi2iso which is able to extract an ISO image from a CDI file. The program’s source clued me in that the metadata is actually sitting at the end of the image file. This makes sense when you consider how a ripping program needs to operate– copy tracks, sector by sector, and then do something with the metadata after the fact. Options include: 1) Write metadata at the end of the file (as seen here); 2) write metadata into a separate file (seen in other formats on this list); 3) write the data at the beginning of the file which would require a full rewrite of the entire (usually large) image file (I haven’t seen this yet).

Anyway, I believe I have enough information to write a program that can interpret a CDI file. The reason this format is favored for Dreamcast disc images is likely due to the extreme weirdness of Dreamcast discs (it’s complicated, but eventually fits into my Grand Unified Theory of CDs, if you look at it from a high level).

MDF / MDS
MDF and MDS pairs come from a program called Alcohol 120%. The MDF file has the data while the MDS file contains the metadata. The metadata is in an opaque binary format, though. Thankfully, the Wikipedia page links to a description of the format. That’s another image format down.

CCD / SUB / IMG
The CloneCD Control File is one I just ran across today thanks to a new image posted at the IA Shareware Archive (see Super Duke Volume 2). I haven’t found any definitive documentation on this, but it also doesn’t seen too complicated. The .ccd file is a text file that is pretty self-explanatory. The sample linked above, however, only has a .ccd file and a .sub file. I’m led to believe that the .sub file contains subchannel information while a .img file is supposed to contain the binary data. So this rip might be incomplete (nope, the .img file is on the page, in the sidebar; thanks to Phil in the comments for pointing this out). The .sub file is a bit short compared to the Archive’s description of the disc’s contents (only about 4.6 MB of data) and when I briefly scrolled through, it didn’t look like it contains any real computer data. So it probably is just the disc’s subchannel data (something I glossed over in my Grand Unified Theory).

CSO
I have dealt with the CISO (compressed ISO) format before. It’s basically the same as a .iso file described above except that each individual 2048-byte data sector is compressed using zlib. The format boasts up to 9 compression levels, which shouldn’t be a big surprise since that correlates to zlib’s own compression tiers.

Others
Wikipedia has a category for optical disc image formats. Of course, there are numerous others. However, I haven’t encountered them in the wild for the purpose of broad image distribution.

Posted in General | 14 Comments »

14 Responses

  1. astrange Says:

    The only other format I’ve seen in the wild is .nrg. Luckily I’ve never had to interpret one of those.

    Now, audio CDs released as .tak + .cue (in a random Windows codepage) + thumbnail .png is quite annoying.

  2. Multimedia Mike Says:

    @astrange: Yeah, I think I have seen .nrg on some Dreamcast sites. Thankfully, I haven’t seen it on the Internet Archive.

  3. Phil Says:

    Actually you’re wrong about the Super Duke Volume 2 set being incomplete – there’s a ~120MB .IMG file there too.

    On the left hand side, under “Play/Download” there’s a “CD/DVD” link which points to the .IMG file.

  4. Jason Scott Says:

    Thanks for doing this, Mike.

    As I pull the images from a wide variety of sources, the many ways people have used to rip CD-ROMs and DVD-ROMs are coming into play. We are likely to see a few more of these formats pop out, like NRG, when we do things like absorb entire hard drives.

  5. pprkut Says:

    Did you know about libmirage (http://cdemu.sourceforge.net/about/libmirage/)?
    It supports a wide variaty of image formats and even if you don’t want to use it for your own processing tools might serve well as inspiration/documentation on certain formats.

  6. Justin Kerk Says:

    Hey there – I’ve uploaded several hundred of the images in that collection. There are some other oddball formats in there:

    NRG
    https://archive.org/details/cdrom-speciali-win-magazine-2011-11-12

    sparseimage
    http://archive.org/details/cdrom-aztech-maccube-1995

    DMG
    https://archive.org/details/Macaddict51November2000

    toast
    http://archive.org/details/MacAddict124cd

    As far as I know the .img part of a CloneCD rip is identical to the .bin part of a .bin/.cue and in the case of single-track data discs (most of the items in the collection) the contents can be read easily with free tools such as The Unarchiver: https://code.google.com/p/theunarchiver/

    For formats like .nrg or .mdf I have tried to convert to ISO (using Daemon Tools + ImgBurn) and include both where possible. The Mac formats I don’t have an easy way of dealing with currently.

  7. Multimedia Mike Says:

    @Phil: Thanks for keeping me honest. I’ve downloaded enough disk images from the IA that I should have figured that out.

    @Jason: The collection will never whither away, forgotten.

    @pprkut: Thanks! I hadn’t heard of libmirage but I am absolutely going to investigate it. I am not averse to seeking out pre-made solutions. Based on the feature list, it’s probably missing one feature I was hoping for. But I might be able to add that.

    @Justin Kerk: Great; I knew I would bump into NRG and a few others eventually.

  8. Bill Kendrick Says:

    From the couple of times I burned some homebrew Sega Dreamcast games (in fact, some of my Linux games that people ported to DC), I seem to recall something about having 1 or 2 seconds of silence at the beginning of the disc. That might explain those 0s. But what do I know? :) It was also a decade ago!

  9. Multimedia Mike Says:

    @Bill Kendrick: That’s exactly it. In order to burn a CD-R that will boot on the Dreamcast, there is a bunch of weirdness involving multisession, an initial short audio track, and binary scrambling.

    I found a shell script that helps me burn DC-bootable CDI images. Ironically, rather than exploiting the features of the CDI image format, it uses cdi2iso to extract out the ISO image and then manually performs all the ancillary work to burn the bootable CD-R. But it works (DC burning can be so painful).

  10. lockecole2 Says:

    I think I went through a fifth of a spindle of (expensive!) CD-Rs figuring out how to burn Dreamcast discs on a Mac using Toast 4 burning at 2x.

    I also seem to remember that it was recommended to pad out the discs to the limit so that the actual data would reside at the outer edges of the disc rather than the inner portions, making the process take especially long.

    I think .CDI was prevalent on the Dreamcast due to the ability for the format to easily express a multi-session disc, as BIN/CUE can only express single session.

    There is also BIN/TOC, but that’s fairly rare to see as it’s only produced by cdrdao. I ended up using that a lot to get a foolproof way to burn Dreamcast discs.

    Notable Dreamcast burning references:
    http://mc.pp.se/dc/
    http://web.archive.org/web/20021017111223/http://dcxeal.virtualave.net/buffer.html

  11. Reimar Says:

    Just in case you are not aware: The major point of formats like nrg and mdf is storing extra information, mostly things that are needed to trick copy protection schemes.
    One of the first ones was the subchannel data, but it is also things like intentionally corrupted sectors (a .iso doesn’t have a way of saying “reading this sector will always fail”) or geometry information (“seeking from sector A to sector B will take longer than A to C” – well, something like this, I never quite figured out how this scheme could work reliably – actually I expect the answer is “it doesn’t work reliably on real hardware, only on a image file copy”).
    There might be even other tricks, for example it would be possible to put a sector onto a CD/DVD twice, which one would be returned would depend on which is read first, which might depend on where the read head was just before.

  12. yoshi314 Says:

    i would just like to nitpick on CISO a little bit.

    Not every sector has to be compressed – some advanced conversion tools can ignore certain files in original iso image if they are known not to compress well or if the user expresses that they be left uncompressed.

    This produces a mixed compression ciso file, with some sectors compressed, some not.

    Probably every ciso handling tool can uncompress such images, even if it does not support such a selective compression scheme.

  13. Derek Says:

    I remember seeing a lot of BlindWrite images (.B6I and pals) at one point. Tons. Maybe it was just a fad…

  14. Jonathan Wilson Says:

    Another interesting (if quite specialized format) is the CHD format used by the MAME arcade emulator. Its a format designed originally to hold hard disk images for arcade machines (and to store them in a way so as to be a 1:1 replica or the original arcade hard disk but compressed to be smaller), its been expanded as a format used to hold all kinds of optical media (and other things too including I think floppy disks from various systems)

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.