I got Google’s libvpx VP8 codec library to compile and run on the Sega Dreamcast with its Hitachi/Renesas SH-4 200 MHz CPU. So give Google/On2 their due credit for writing portable software. I’m not sure how best to illustrate this so please accept this still photo depicting my testbench Dreamcast console driving video to my monitor:
Why? Because I wanted to try my hand at porting some existing software to this console and because I tend to be most comfortable working with assorted multimedia software components. This seemed like it would be a good exercise.
You may have observed that the video is blue. Shortest, simplest answer: Pure laziness. Short, technical answer: Path of least resistance for getting through this exercise. Longer answer follows.
Update: I did eventually realize that the Dreamcast can work with YUV textures. Read more in my followup post.
Process and Pitfalls
libvpx comes with a number of little utilities including decode_to_md5.c
. The first order of business was porting over enough source files to make the VP8 decoder compile along with the MD5 testbench utility.
Again, I used the KallistiOS (KOS) console RTOS (aside: I’m still working to get modern Linux kernels compiled for the Dreamcast). I started by configuring and compiling libvpx on a regular desktop Linux system. From there, I was able to modify a number of configuration options to make the build more amenable to the embedded RTOS.
I had to create a few shim header files that mapped various functions related to threading and synchronization to their KOS equivalents. For example, KOS has a threading library cleverly named kthreads which is mostly compatible with the more common pthread library functions. KOS apparently also predates stdint.h, so I had to contrive a file with those basic types.
So I got everything compiled and then uploaded the binary along with a small VP8 IVF test vector. Imagine my surprise when an MD5 sum came out of the serial console. Further, visualize my utter speechlessness when I noticed that the MD5 sum matched what my desktop platform produced. It worked!
Almost. When I tried to decode all frames in a test vector, the program would invariably crash. The problem was that the file that manages motion compensation (reconinter.c) needs to define MUST_BE_ALIGNED which compiles byte-wise block copy functions. This is necessary for CPUs like the SH-4 which can’t load unaligned data. Apparently, even ARM CPUs these days can handle unaligned memory accesses which is why this isn’t a configure-time option.
Showing The Work
I completed the first testbench application which ran the MD5 test on all 17 official IVF test vectors. The SH-4/Dreamcast version aces the whole suite.
However, this is a video game console, so I had better be able to show the decoded video. The Dreamcast is strictly RGB– forget about displaying YUV data directly. I could take the performance hit to convert YUV -> RGB. Or, I could just display the intensity information (Y plane) rendered on a random color scale (I chose blue) on an RGB565 texture (the DC’s graphics hardware can also do paletted textures but those need to be rearranged/twiddled/swizzled).
Results
So, can the Dreamcast decode VP8 video in realtime? Sure! Well, I really need to qualify. In the test depicted in the picture, it seems to be realtime (though I wasn’t enforcing proper frame timings, just decoding and displaying as quickly as possible). Obviously, I wasn’t bothering to properly convert YUV -> RGB. Plus, that Big Buck Bunny test vector clip is only 176×144. Obviously, no audio decoding either.
So, realtime playback, with a little fine print.
On the plus side, it’s trivial to get the Dreamcast video hardware to upscale that little blue image to fullscreen.
I was able to tally the total milliseconds’ worth of wall clock time required to decode the 17 VP8 test vectors. As you can probably work out from this list, when I try to play a 320×240 video, things start to break down.
- Processed 29 176×144 frames in 987 milliseconds.
- Processed 49 176×144 frames in 1809 milliseconds.
- Processed 49 176×144 frames in 704 milliseconds.
- Processed 29 176×144 frames in 255 milliseconds.
- Processed 49 176×144 frames in 339 milliseconds.
- Processed 48 175×143 frames in 2446 milliseconds.
- Processed 29 176×144 frames in 432 milliseconds.
- Processed 2 1432×888 frames in 2060 milliseconds.
- Processed 49 176×144 frames in 1884 milliseconds.
- Processed 57 320×240 frames in 5792 milliseconds.
- Processed 29 176×144 frames in 989 milliseconds.
- Processed 29 176×144 frames in 740 milliseconds.
- Processed 29 176×144 frames in 839 milliseconds.
- Processed 49 175×143 frames in 2849 milliseconds.
- Processed 260 320×240 frames in 29719 milliseconds.
- Processed 29 176×144 frames in 962 milliseconds.
- Processed 29 176×144 frames in 933 milliseconds.
This is super cool!
yay Mike!
Do you need to upsample the chroma in those clips? If so, that too is not a free operation.
YUV2RGB is 4 table lookups, 1 shift and 4 int adds per pixel, using the accurate algorithm I know (if you don’t have vector units or fixed function hardware).
@Z.T.: I probably won’t persist in this particular experiment but there are several approaches I could take to convert the YUV. There is the table-based approach. There is direct calculation; along these lines, it might be possible to use the CPU’s matrix/vector multiplication units (sort of SIMD). There is also a hardware blending solution (which would offload upscaling to the hardware but has other limitations).
I would have to empirically test the performance of each. And, darn it, this is starting to sound like an interesting project. :-)
Probably a stupid question but why did you use libvpx and not ffvp8?
@kierank: Valid question. I just wanted to re-acquaint myself with programming this platform by using a smaller library, one that I expected would not have a lot of dependencies.
I do want to get FFmpeg on here somehow. The SH CPU architecture is conspicuously absent from FATE. However, I tend to think I need to be successful in getting a current Linux kernel built for the platform before I can reasonably get FFmpeg running.
Actually, I did port some of FFmpeg to the DC/KOS platform back in 2003 (codecs, specifically). I’m trying to find that code.
Hi Mike,
I’ve just read your “Using Advanced 3D Texturing Hardware to Convert Planar YUV to RGB” paper you posted to the VP8 list and I was thinking maybe this would work for my Java VP8 decoder (where some Graphics2D functions are hardware accelerated). Do you know of anyone trying this?
@Brooss: Nice work on the pure-Java VP8 decoder. I don’t know of anyone who has done the blending trick described in that paper aside from myself and the other author. I believe this type of thing is handled using pixel shaders these days. The paper’s technique might be quite out of date by now but I suppose it depends on what kind of graphics facilities you can access from Java.
Hello Mike,
Nice job ! Is it possible to have access to your sourcecode ? I’m trying to port theora to the dreamcast and I’m very curious to see your port.
Thanks if it is possible !
@Patbier: I’d like to eventually release a bunch of this DC stuff I’m working on right now. It’s all so specialized that I’m not really sure how useful it would be for anyone. But I guess that’s not how one should judge whether to release something as open source.
As for porting the VP8 source, I didn’t really do anything special except for defining MUST_BE_ALIGNED. Otherwise, I just created a program to call the VP8 decoding function and display the decompressed data.
Actually, the Dreamcast’s video hardware does have some support for YUV textures. Specifically, it does support texturing with YUV422 data, and has internal conversion for YUV420->YUV422.
That said, I’ve never worked with YUV stuff directly, even if I am one of the KOS developers, so I can’t really comment on how one might use it (and unfortunately things like that tend to be woefully undocumented).
@BlueCrab: Indeed, I discovered the YUV422 texturing and put it to use in this application shortly after this post:
http://multimedia.cx/eggs/notes-on-linux-for-dreamcast/
I would be curious about hardware support for YUV420->YUV422. I have a theory about how to do it but haven’t wired up an experiment.
Yeah, I didn’t see that later post until just now (someone linked me directly to this one). I know I have seen documents around that talk about the YUV420->YUV422 conversion, but none that I remember specifically where I’ve found them.