I got Google’s libvpx VP8 codec library to compile and run on the Sega Dreamcast with its Hitachi/Renesas SH-4 200 MHz CPU. So give Google/On2 their due credit for writing portable software. I’m not sure how best to illustrate this so please accept this still photo depicting my testbench Dreamcast console driving video to my monitor:
Why? Because I wanted to try my hand at porting some existing software to this console and because I tend to be most comfortable working with assorted multimedia software components. This seemed like it would be a good exercise.
You may have observed that the video is blue. Shortest, simplest answer: Pure laziness. Short, technical answer: Path of least resistance for getting through this exercise. Longer answer follows.
Update: I did eventually realize that the Dreamcast can work with YUV textures. Read more in my followup post.
Process and Pitfalls
libvpx comes with a number of little utilities including
decode_to_md5.c. The first order of business was porting over enough source files to make the VP8 decoder compile along with the MD5 testbench utility.
Again, I used the KallistiOS (KOS) console RTOS (aside: I’m still working to get modern Linux kernels compiled for the Dreamcast). I started by configuring and compiling libvpx on a regular desktop Linux system. From there, I was able to modify a number of configuration options to make the build more amenable to the embedded RTOS.
I had to create a few shim header files that mapped various functions related to threading and synchronization to their KOS equivalents. For example, KOS has a threading library cleverly named kthreads which is mostly compatible with the more common pthread library functions. KOS apparently also predates stdint.h, so I had to contrive a file with those basic types.
So I got everything compiled and then uploaded the binary along with a small VP8 IVF test vector. Imagine my surprise when an MD5 sum came out of the serial console. Further, visualize my utter speechlessness when I noticed that the MD5 sum matched what my desktop platform produced. It worked!
Almost. When I tried to decode all frames in a test vector, the program would invariably crash. The problem was that the file that manages motion compensation (reconinter.c) needs to define MUST_BE_ALIGNED which compiles byte-wise block copy functions. This is necessary for CPUs like the SH-4 which can’t load unaligned data. Apparently, even ARM CPUs these days can handle unaligned memory accesses which is why this isn’t a configure-time option.
Showing The Work
I completed the first testbench application which ran the MD5 test on all 17 official IVF test vectors. The SH-4/Dreamcast version aces the whole suite.
However, this is a video game console, so I had better be able to show the decoded video. The Dreamcast is strictly RGB– forget about displaying YUV data directly. I could take the performance hit to convert YUV -> RGB. Or, I could just display the intensity information (Y plane) rendered on a random color scale (I chose blue) on an RGB565 texture (the DC’s graphics hardware can also do paletted textures but those need to be rearranged/twiddled/swizzled).
So, can the Dreamcast decode VP8 video in realtime? Sure! Well, I really need to qualify. In the test depicted in the picture, it seems to be realtime (though I wasn’t enforcing proper frame timings, just decoding and displaying as quickly as possible). Obviously, I wasn’t bothering to properly convert YUV -> RGB. Plus, that Big Buck Bunny test vector clip is only 176×144. Obviously, no audio decoding either.
So, realtime playback, with a little fine print.
On the plus side, it’s trivial to get the Dreamcast video hardware to upscale that little blue image to fullscreen.
I was able to tally the total milliseconds’ worth of wall clock time required to decode the 17 VP8 test vectors. As you can probably work out from this list, when I try to play a 320×240 video, things start to break down.
- Processed 29 176×144 frames in 987 milliseconds.
- Processed 49 176×144 frames in 1809 milliseconds.
- Processed 49 176×144 frames in 704 milliseconds.
- Processed 29 176×144 frames in 255 milliseconds.
- Processed 49 176×144 frames in 339 milliseconds.
- Processed 48 175×143 frames in 2446 milliseconds.
- Processed 29 176×144 frames in 432 milliseconds.
- Processed 2 1432×888 frames in 2060 milliseconds.
- Processed 49 176×144 frames in 1884 milliseconds.
- Processed 57 320×240 frames in 5792 milliseconds.
- Processed 29 176×144 frames in 989 milliseconds.
- Processed 29 176×144 frames in 740 milliseconds.
- Processed 29 176×144 frames in 839 milliseconds.
- Processed 49 175×143 frames in 2849 milliseconds.
- Processed 260 320×240 frames in 29719 milliseconds.
- Processed 29 176×144 frames in 962 milliseconds.
- Processed 29 176×144 frames in 933 milliseconds.