It has not been a very productive year for blogging. But I started the year by describing an unfinished project that I developed for the Sega Dreamcast, so I may as well end the year the same way. The previous project was a media player. That initiative actually met with some amount of success and could have developed into something interesting if I had kept at it.
By contrast, this post describes an effort that was ultimately a fool’s errand that I spent way too much time trying to make work.
Problem Statement
In my neverending quest to analyze the structure of video games while also hoarding a massive collection of them (though I’m proud to report that I did play at least a few of them this past year), I wanted to be able to extract the data from my many Dreamcast titles, both games and demo discs. I had a tool called the DC Coder’s Cable, a serial cable that enables communication between a Dreamcast and a PC. With the right software, you could dump an entire Dreamcast GD-ROM, which contained a gigabyte worth of sectors.
Problem: The dumping software (named ‘dreamrip’ and written by noted game hacker BERO) operated in a very basic mode, methodically dumping sector after sector and sending it down the serial cable. This meant that it took about 28 hours to extract all the data on a single disc by running at the maximum speed of 115,200 bits/second, or about 11 kilobytes/second. I wanted to create a faster method.
The Pitch
I formed a mental model of dreamrip’s operation that looked like this:
As an improvement, I envisioned this beautiful architecture:
Architectural Assumptions
My proposed architecture was predicated on the assumption that the disc reading and serial output functions were both I/O-bound operations and that the CPU would be idle much of the time. My big idea was to use that presumably idle CPU time to compress the sectors before sending them over the wire. As long as the CPU can compress the data faster than 11 kbytes/sec, it should be a win. In order to achieve this, I broke the main program into 3 threads:
- The first thread reads the sectors; more specifically, it asks the drive firmware to please read the sectors and make the data available in system RAM
- The second thread waits for sector data to appear in memory and then compresses it
- The third thread takes the compressed data when it is ready and shuffles it out through the serial cable
Simple and elegant, right?
For data track compression, I wanted to start with zlib in order to prove the architecture, but then also try bzip2 or lzma. As long as they could compress data faster than the serial port could write it, then it should be a win. For audio track compression, I wanted to use the Flake FLAC encoder. According to my notes, I did get both bzip2 compression and the Flake compressor working on the Dreamcast. I recall choosing Flake over the official FLAC encoder because it was much simpler and had fewer dependencies, always an important consideration for platforms such as this.
Problems
I worked for quite awhile on this project. I have a lot of notes recorded but a lot of the problems I had remain a bit vague in my memory. However, there was one problem I discovered that eventually sunk the entire initiative:
The serial output operation is CPU-bound.
My initial mental model was that the a buffer could be “handed off” to the serial subsystem and the CPU could go back to doing other work. Nope. Turns out that the CPU was participating at every step of the serial transfer.
Further, I eventually dug into the serial driver code and learned that there was already some compression taking place via the miniLZO library.
Lessons Learned
- Recognize the assumptions that you’re making up front at the start of the project.
- Prototype in order to ensure plausibility
- Profile to make sure you’re optimizing the right thing (this is something I have learned again and again).
Another interesting tidbit from my notes: it doesn’t matter how many sectors you read at a time, the overall speed is roughly the same. I endeavored to read 1000 2048-byte data sectors, 1 or 10 or 100 at a time, or all 1000 at once. My results:
- 1: 19442 ms
- 10: 19207 ms
- 100: 19194 ms
- 1000: 19320 ms
No difference. That surprised me.
Side Benefits
At one point, I needed to understand how BERO’s dreamrip software was operating. I knew I used to have the source code but I could no longer find it. Instead, I decided to try to reverse engineer what I needed from the SH-4 binary image that I had. It wasn’t an ELF image; rather, it was a raw binary meant to be loaded at a particular memory location which makes it extra challenging for ‘objdump’. This led to me asking my most viewed and upvoted question on Stack Overflow: “Disassembling A Flat Binary File Using objdump”. The next day, it also led me to post one of my most upvoted answers when I found the solution elsewhere.
Strangely, I have since tried out the command line shown in my answer and have been unable to make it work. But people keep upvoting both the question and the answer.
Eventually this all became moot when I discovered a misplaced copy of the source code on one of my computers.
I strongly recall binging through the Alias TV show while I was slogging away on this project, so I guess that’s a positive association since I got so many fun screenshots out of it.
The Final Resolution
Strangely, I was still determined to make this project work even though the Dreamcast SD adapter arrived for me about halfway through the effort. Part of this was just stubbornness, but part of it was my assumptions about serial port speeds, in particular, my assumption that there was a certain speed-of-light type of limitation on serial port speeds so that the SD adapter, operating over the DC’s serial port, would not be appreciably faster than the serial cable.
This turned out to be very incorrect. In fact, the SD adapter is capable of extracting an entire gigabyte disc image in 35-40 minutes. This is the method I have since been using to extract Dreamcast disc images.
I assume the Dreamcast’s serial port can be used in either asynchronous or synchronous mode, like on the Saturn? On the Saturn I successfully used an FTDI FT2232H module in synchronous serial mode with a 1.2MHz serial clock rate, the Dreamcast can probably go even higher.
@asdf: Thanks for the feedback. It makes me want to go experiment with it, against my better judgment. :-) That’s the thing about these rabbit holes.