After completing gcfuse and xbfuse, I am hungry for more game-related filesystem hacking. I set my sights on another format — I wondered what lay in hiding on those universal media discs (UMDs) that the Sony PlayStation Portable uses.
You probably don’t have the tools or the know-how to rip sectors off of these discs by yourself. If you happen to come across a pre-ripped image in the shadier portions of the internet, you may find that the rip has an extension of .cso. This stands for compressed ISO. Would you be surprised to learn that there is not a lot of documentation about this format out there? There are some closed but free Windows tools out there to convert between ISO and CISO and some other formats. What knowledge is out there indicates that the format offers 9 different compression levels. So does stock zlib.
Internally, a CISO format begins with a ‘CISO’ signature and eventually starts in with an enormous table of indices. These seem to be absolute offsets into the file. My first theory was that they are individual files. Combined with the above zlib speculation, I figured that each file was zlib compressed. That’s when I enlisted the help of the fellow who recently implemented native zlib functionality in FFmpeg — Mans Rullgard.
What he managed to figure out was that each index in the table actually references a 2-kilobyte sector. Further, the offset has its high bit cleared to indicate that the sector is the last of a file (a UMD has a max capacity of 1.8 GB which can be expressed in only 31 bits). Thus, the deltas between the indices imply the length of the sector, which is almost alway 2048 bytes, except for (statistically) the last sector of the file. So this leads us to the revolutionary compression technology on display here–
Do not store the final sectors of individual files with implicit zero-padding. (Not true; read further analysis in the comments.)
Hey, it counts as compression. The technique exploits a characteristic inherent in a specific type of data. It seemed a bit silly at first, in a “would that actually make a difference?” kind of way. But the math does work. At least for the few representative samples observed, the CISO image is about 75% of the size of the “uncompressed” image. If a disc uses its full 1.8 GB, then 1,800,000,000 bytes / 2048 bytes/sector = 878907 sectors. 878907 sectors * 4 index bytes/sector = 3515628 bytes. It’s 3.5 MB, but not nearly enough to blow the 2 GB limit necessary to make this format work in the worst case scenario.
Honestly, I haven’t had this much fun collaboratively cracking a file format since a bunch of us got together back in the day and worked out Nullsoft Video (NSV)… and then the official spec was released anyway.
There are still a number of mysteries:
- None of this explains the advertised compression levels feature. I am starting to think that perhaps the entire index table can be zlib compressed, but that it simply wasn’t done for the observed samples.
- There is no real filesystem data present. It stands to reason that the original UMD must have some notion of a filesystem since some utility knew where the files lived and how long they were. The Wikipedia article claims that these discs use an ISO-9660 filesystem. I generally know what ISO-9660 filesystem data looks like, and I’m not seeing it here. There isn’t much familiar, except for some PNG files which are difficult to interpret, leading to…
- Is the filesystem fragmented? That would be highly uncharacteristic of a filesystem built to live on an optical, read-only medium. But cursory investigations have indicated that there might be some fragmentation.
- Exactly what is the usage model for these images? Real-time access from a memory stick, I hope? Because Mans determined that stock bzip2 performs better than CISO on a raw image.
Here are the format details we have worked out so far. If anyone knows the original author of the CISO format, send him over here. In my experience, people who design these types of formats are not necessarily trying to keep anything secret; they just haven’t gotten around to writing up and publishing a formal description, and have no compelling reason to release their tools’ source code into a Windows world where most people can’t compile anyway.