Author Archives: Multimedia Mike

PS3 Notes

I have been working (and occasionally playing) with my PlayStation 3 recently. I upgraded the 80 GB internal hard drive to a 1/2 TB one. Since I have the old 80 GB HD laying around, of course I have to plug it in and see if there’s anything familiar about the data. It’s a short exploration: As you might suspect, the HD is completely impenetrable. No partition table reported through Linux fdisk. No human-readable strings can be seen when running ‘strings’ over the raw HD sectors. Based on forum postings I have read where one PS3 HD can’t successfully be transplanted to another PS3 (and have all the data accessible; the HD could still be reformatted fresh to work in another PS3), I’m guessing that every sector is encrypted with a key derived at least partially from a unique ID embedded in each console. That’s all the effort I plan to put into this exercise. Next stop for this HD is my Eee PC 701 which is currently struggling to run Ubuntu Linux on a mere 4 GB SSD.

I downloaded a free movie trailer through the PlayStation store. When I inspected the information through the PS3’s XMB menu, the filetype was reported as “MNV”. A little Googling ties this format into the paid content format of the PS3 store. I’m not especially confident about this format since the trailer that I downloaded doesn’t even play correctly on the PS3. The video stutters back and forth, almost as though it’s swapping pairs of frames during playback: 1, 0, 3, 2, 5, 4, 7, 6, etc. The XMB allows me to “backup” this media. This option needs to be distinguished from “copy”, which is sometimes an option. “Copy” implies an unlocked version that can be copied onto removable media and used anywhere. “Backup” implies that it can be copied onto removable media but is still keyed to — and can only be used on — this console. I backed it up and was able to inspect the data on the USB drive. It turns out that the MNV file is still a stock MP4 but with custom DRM. When FFmpeg is aimed at this file, this is the result:

[h264 @ 0x1004000]AVC: nal size -2055117847
[h264 @ 0x1004000]no frame!
[...repeated many times...]
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x1002600]max_analyze_duration reached

Seems stream 0 codec frame rate differs from container frame rate:
 48000.00 (48000/1) -> 23.98 (24000/1001)
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
 '/Volumes/KINGSTON/PS3/EXPORT/VIDEOBKP/20091220-220733-00000001/20091220-220733-00000001.001':
  Metadata:
    major_brand     : MGSV
    minor_version   : 20842393
    compatible_brands: MGSVmp42isom
  Duration: 00:01:46.64, start: 0.000000, bitrate: 8651 kb/s
    Stream #0.0(und): Video: h264, 2205 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 48k tbc
    Stream #0.1(eng): Audio: aac, 48000 Hz, stereo, s16, 264 kb/s
    Stream #0.2(eng): Audio: aac, 48000 Hz, 5.1, s16, 395 kb/s
    Stream #0.3(und): Data: mp4s / 0x7334706D, 759552 kb/s
Video pixel format is unknown, stream cannot be decoded

I remember some patches flying around the FFmpeg-devel list recently which would allow the program to print warnings and bail out if it encountered a known DRM scheme. When I shove an Apple-encrypted file through FFmpeg, it doesn’t tell me anything special so I don’t think the patch is in yet. However, FFmpeg should probably detect this type of DRM file as well.

Better Compression For ISOs

Pursuant to my project idea of archiving a bunch of obscure, old, CD-ROM-based games, I got an idea about possibly compressing ISO-9660 filesystems more efficiently than could be done with standard lossless compressors such as gzip, bzip2, and lzma. This is filed under “outlandish brainstorms” since I don’t intend to get around to doing this anytime in the near future, but I wanted to put the idea out there anyway.

Game developers throughout the era of optical disc-based games have been notoriously lazy about data efficiency. A typical manifestation of this is when one opens a disc to find dozens or hundreds of sizable, uncompressed WAV audio files. Same goes for uncompressed graphical assets. General lossless compressors are able to squeeze some bytes out of uncompressed audio and video data but specialized algorithms (like FLAC for audio and PNG for images) perform better.

Here’s the pitch: Create a format that analyzes individual files in an ISO filesystem and uses more appropriate compression algorithms. If there’s a 16-bit PCM WAV file, use FLAC to compress it. If there’s an uncompressed BMP file, compress as a PNG file. Stuff them all in a new file with an index and a mapping to their original files.

As an additional constraint, it obviously needs to be possible to reconstruct a bit-exact ISO from one of these compressed images. So the index will need to store information about what sector ranges different files occupied.

The only other attempt I have seen for a specialized ISO compression format is the CISO format used for compressing ISO filesystems for storage on flash memory to be read in PSP units. That format uses zlib to compress sector by sector which has the advantage of being able to mount and use a CISO image in place without decompressing the entire filesystem. That should be possible for this filesystem a well by using a FUSE driver. Mount the filesystem and the driver presents the file tree upon request and decompressed the files on demand.

Many compression algorithms have assorted methods of operation depending on what is appropriate for a particular range of data. This scheme is merely an extension of that principle. I have no idea if this idea would hold water in the general case. But thanks to my archiving brainstorm, I expect I will have quite a lot of data to analyze.

See Also:

Archivist’s Burden

Because I don’t have enough unfinished projects in my life, and because I have recently fallen in with a crowd of fanatical archivists, I started to ponder the possibility of systematically archiving all of the CD-based video games that I have collected. One day, some historian is going to want to study the proliferation of wedding simulators and licensed Barbie games. Then I noticed that I have an unused 2x750GB RAID brick (configured for redundancy, so 750 GB capacity), and that I haven’t had enough bad experiences with RAID technologies yet. Put the 2 together: why not try archiving a bunch of these games on said RAID brick?

So, that’s the pitch: archive a bunch of game discs onto this RAID brick. Assumption: a ‘disc’ is a string of 2048-byte sectors comprising a CD-ROM mode 1/form 1 data track, plus 1 or more redbook CD audio tracks if they exist. Some questions:

  • What filesystem should the RAID brick be formatted in?
  • Should the data tracks be compressed? If so, what format?
  • Should the audio tracks be compressed (losslessly)? If so, what format?
  • How should the information be organized on the disc?
  • How should this process be automated?

Be advised that these are problems for which there is no “right” solution. There are many solutions and choosing one is all about evaluating trade-offs. Here are some solutions I am planning for:

  • I have formatted the brick with Linux Ext3. At one point I had the brick formatted with FAT32 so it could easily be shared between many operating systems. Unfortunately, I had a bad experience with data integrity which marked either another strike against RAID tech in general, or perhaps a faulty FAT32 driver in either Linux or Mac OS X. Further, there’s the issue that I can’t abide Mac OS X’s insistence on pooping metadata files all over every filesystem it touches. I have that problem with my pocket USB drive. At least if it’s not in a Mac-native filesystem, I won’t have to worry about that.
  • Compressing the data track should be performed using one of the usual suspects in Unix-land: gzip, bzip2, or lzma (at least, those are the ready-made solutions; I have a theory about better compression to be expounded upon in a later post). Which is best? Lzma is generally the champ in terms of size reduction. It’s useful to consider the time tradeoff. I was recently trying to compress some very large files for internet transfer and noticed that lzma took substantially longer than other options. But since this is for long term storage, it’s probably worth it to squeeze out the largest compression factor.
  • For compressing the audio tracks, the 2 frontrunners are FLAC and Apple Lossless (ALAC). FLAC is obviously in consideration because it’s free and open source and all that. However, ALAC, while not necessarily free and open source in the most pure notion, is still free and open source enough for this purpose (FFmpeg has both a decoder and a competitive encoder). Archiving using ALAC also makes it possible to easily listen to the music using iTunes, my usual audio program. Thus, I’m leaning towards ALAC here.
  • I need to come up with a directory structure for the master archive disc. I imagine something along the lines of “/game title” for single-disc games and “/game title/disc n” for multi-disc games. Each directory will have “track01.iso” and 0 or more “tracknn.m4a” files depending on the presence of audio tracks. That’s the general pattern, anyway.
  • As for automating this, it will naturally be done with a Python script of my own creation. I wager I could actually make something that queries a disc directly (Python has ioctl libraries) and then read the raw data and audio tracks. That’s how I implemented the CD playback support in xine. This time, I suspect I’ll just be invoking cdparanoia and parsing the output to determine number of tracks and for subsequent ripping.

I am thinking that the automated Python script should maintain a centralized SQLite database at the root of the archive disc. This database should keep track of archived disc locations, the ripped tracks, and their before- and after-compression sizes. Further, it should store records of each file (with relative path) in the filesystem along with its timestamp and size. Also, the database needs to log information about errors encountered during the ripping process (i.e., store the stderr text from the data copy operation). Bonus points for running the Unix ‘file’ command against each file and also storing the output. This last bit of information would be useful for finding, e.g., all of the Smacker files in every game in the archive. There are still specialized game-related formats that the ‘file’ type database will not recognize. For this reason, it may also be useful to record the first, say, 32 bytes of a file in the central database as well for later identification. Maybe that’s overkill; I’m still tossing around ideas here.

Open questions:

  1. How do I rip a data track that is not the first track of an optical disc? My standard method is to use the Unix command ‘dd if=/dev/disc of=file.iso’ but that only works if the first track contains data. I have encountered at least one disc on which the data track isn’t first. Come to think of it, I guess this is standard for “enhanced” audio CDs and this particular demo disc might have fallen into that category.
  2. How do I mount an ISO filesystem for browsing without needing root permissions? Since I want to put as much brainpower into this automated script as possible and I don’t want to run the script as root, it will be necessary to at least be able to get a listing of the directory structure, and preferable to mount the filesystem and run ‘file’ against each one. There is the option of mounting via the loopback device (maybe loosening the permissions of the loopback device will help). FUSE seems like an option, in theory, but the last time I checked, there were no ISO-9660 drivers for FUSE.

Archiving a bunch of old games seems so simple. Leave it to me to make it complicated.

One Gluttonous if Statement

I’m still haunted by the slow VP3/Theora decoder in FFmpeg. While profiling reveals that most of the decoding time is spent in a function named unpack_vlcs(), I have been informed that finer-grained profiling indicates that the vast majority of that time is spent evaluating the following if statement:

  if (coeff_counts[fragment_num] > coeff_index)
    continue;

I put counters before and after that statement and ran the good old Big Buck Bunny 1080p movie through it. These numbers indicate, per frame, the number of times the if statement is evaluated and the number of times it evaluates to true:

[theora @ 0x1004000]3133440, 50643
[theora @ 0x1004000]15360, 722
[theora @ 0x1004000]15360, 720
[theora @ 0x1004000]0, 0
[theora @ 0x1004000]13888, 434
[theora @ 0x1004000]631424, 10711
[theora @ 0x1004000]2001344, 36922
[theora @ 0x1004000]1298752, 22897
[...]

Thus, while decoding the first frame, the if statement is evaluated over 3 million times but further action (i.e., decoding a coefficient) is only performed 50,000 times. Based on the above sample, each frame sees about 2 orders of magnitude more evaluations than are necessary.

Clearly, that if statement needs to go.

Continue reading