ZMBV Tinkering

Pursuant to that outlandish PAVC idea of mine (that I first started writing about 3 years ago), I began to tinker with the DosBox Video Codec, a.k.a. Zip Motion Blocks Video or ZMBV for short, which was suggested as an alternative the last time I posted about this topic. I have met with some mixed results.

Briefly, the goal of PAVC is to efficiently and losslessly compress frames of video generated by early video game systems, where “early” is defined as anything including and preceding the Super Nintendo Entertainment System. ZMBV is designed to compress, in real time, data from old DOS games running at a number of color modes, with careful consideration given to 8-bit palette data. FFmpeg had an independent ZMBV decoder soon after the codec appeared in DosBox, and also features an independent encoder as well; both the encoder and decoder come courtesy of Kostya.

Item 1: This past week, I found a way to improve the motion estimator in FFmpeg’s ZMBV encoder by changing the assumption it used to compute error between 2 blocks. Michael, being the compulsive mathematical gambler that he is, said, “I’ll see your rudimentary error minimizer and I’ll raise you a 0th-order entropy approximater.” No, I don’t understand what that means either. But the net result is that all 8-bit files that FFmpeg converts to ZMBV are notably more compact. Score 1.

Item 2: The smallest frame that ZMBV can possibly create, even for a frame that is completely unchanged from the prior frame, is in the neighborhood of 11-12 bytes. The type of data I want to compress has a lot of unchanged frames so this seems like a good opportunity to shave off more bytes. I think it’s possible to store 0-length payloads in AVI video frames which indicate that the frame is unchanged. I am having trouble hooking this up in FFmpeg, though. For the time being, I co-opted one of the reserved flags in the first ZMBV byte as an “unchanged” flag and modified the decoder to recognize it. At least now, unchanged frames only occupy a single byte.

Item 3: Colorspace conversion. The emulators generate video in 24- or 32-bit RGB colorspaces, even though the frames are more or less guaranteed to contain well under 256 unique RGB colors. It’s conducive for the video pipeline: The emulator is probably already creating a 24- or 32-bit RGB bitmap for display, so just take that same image and stuff it in an AVI file with a frame header. The emulators are open source so I could modify this assumption myself. Or I could see what can be done in FFmpeg after the fact.


RGB color cube

I was surprised to see that FFmpeg had no trouble converting the RGB video to PAL8 video. I guess I had never tried before. However, something seemed off about the colors. A little FFmpeg source code excavation turned up a function called build_rgb_palette(). This is responsible for computing a 216-entry 6x6x6 RGB color cube that is later used to find the best match for each RGB pixel in the input frame. This turns out to be imperfect:

Castlevania 4 (SNES) -- good palette Castlevania 4 (SNES) -- bad palette
Super Castlevania 4 (SNES) with proper palette Super Castlevania 4 (SNES) with FFmpeg’s palette matching algorithm

Though it is a reasonable algorithm for the type of data that FFmpeg handles on any given day. I modified the ZMBV encoder to accept 24- and 32-bit RGB data directly and converted it to PAL8 myself. The good news is that this technically worked and I got accurate data for compression. The bad news is that compression becomes less efficient than when using FFmpeg’s palette matching algorithm. But this makes sense– FFmpeg was basically quantizing the input data which improved the ensuing compression. That’s a step backwards from my overall goal.

Item 4: Keyframe frequency. I’m not completely sure how the keyframe system operates in FFmpeg. But looking at the current ZMBV encoder, it looks like the engine recommends to the video encoder when it should insert a keyframe. It looks to me like new keyframes are occurring entirely too frequently. Fewer keyframes would likely improve compression, especially since this is such high framerate data (either 50 or 60 fps).

Item 5: ZMBV algorithm modifications. ZMBV already has a version field so it should be possible to upgrade the existing codec rather than spinning off a separate codec. To my knowledge, there are only 2 implementations in existence– FFmpeg, and the original DosBox codebase. It might be possible to get well-reasoned algorithm improvements into the standard.

Going above and beyond the change flag from item 2 above, I hypothesize that it might be useful to change some things about the core algorithm. For example, version 1 of the codec only allows changing the block size at the start of a group of pictures (GoP). It might be useful to allow individual frames to optionally set a new block size and to also allow it to be larger than 255. I am thinking that games with no scrolling and little character movement might be able to benefit by making the entire screen a block and storing the XOR against the previous frame, thus capturing what little movement has occurred and not having to compress a stream of empty 16-bit block info chunks.

It might also be useful to slice the field into tall strips when the video is scrolling horizontally or long strips when the video is scrolling vertically. This is already possible under the current algorithm (stipulating that the dominant strip dimension is less than 256), but hard to use unless you can anticipate the scrolling direction in advance of the keyframe which sets the block size.

And I still think it would be useful to define an encoding mode where the entire frame is scrolled (+/- X pixels, +/- Y pixels) in relation to the previous frame, diff the scrolled section using the usual XOR method, and define a way to fill in the new pixels that have just entered the field around the borders.

I should have proposed all of the above items and more as a “lab-rat” project for one of FFmpeg’s Summer of Code slots.

Fundamentally, any ZMBV algorithmic improvement needs to be informed by a basic understanding of how the DEFLATE algorithm operates since whatever pre-processing takes place eventually gets shoved through zlib for the main compression (come to think of it, that’s pretty much how PNG operates as well). I have come to appreciate this aspect of the algorithm, however, because I am pondering the idea of implementing a ZMBV decoder in Flash’s ActionScript 3 language; AS3 provides APIs for decoding zlib data, greatly facilitating decompression.

2 thoughts on “ZMBV Tinkering

  1. Kostya

    First of all, Michael’s code explained: he used table to estimate entropy (well, base term from the Shannon’s theory of information) of the XOR’ed block. In other words, the higher count of the same symbols, the lower entropy and the better compression of the block. 0-th order means he didn’t use context to estimate entropy. Arithmetic coding can enlight you more ;)

    As for the codec, my intent was to have lossless encoder especially for 8-bit videos, for FFmpeg could not code them efficiently.

    (Un)fortunately, ZMBV was not designed with extensibility in mind, so next version bitstream will be totally different from current one. So you can think up a totally new scheme.

    Some of my thoughts:
    * everybody ends up with block-based compression (well, not always, but exceptions are rare)
    * there are usually those methods applied to each block: totally new block, skipping, applying changes (with a mask), filling block with colour, filling block with N colours using some pattern (coded or one from the standard ones); motion compensation is a bit rare and so it quadtree encoding employed by KMVC
    * none of the fancy things like transform, spatial prediction and context-dependent coding have been tried. They are too slow for the supercool 386SX I guess.

    Also you may want to look at
    http://x264dev.blogspot.com/2008/05/introducing-photon.html
    Looks like a lot of developers want to create their own codec ;)

    And the last thing – I think skip frames can be coded as the single byte by ZMBV too if you skip compressing at all (and decoder should handle it too).

Comments are closed.