Basic Video Palette Conversion

How do you take a 24-bit RGB image and convert it to an 8-bit paletted image for the purpose of compression using a codec that requires 8-bit input images? Seems simple enough and that’s what I’m tackling in this post.

Ask FFmpeg/Libav To Do It
Ideally, FFmpeg / Libav should be able to handle this automatically. Indeed, FFmpeg used to be able to, at least at the time I wrote this post about ZMBV and was unhappy with FFmpeg’s default results. Somewhere along the line, FFmpeg and Libav lost the ability to do this. I suspect it got removed during some swscale refactoring.

Still, there’s no telling if the old system would have computed palettes correctly for QuickTime files.

Distance Approach
When I started writing my SMC video encoder, I needed to convert RGB (from PNG files) to PAL8 colorspace. The path of least resistance was to match the pixels in the input image to the default 256-color palette that QuickTime assumes (and is hardcoded into FFmpeg/Libav).

How to perform the matching? Find the palette entry that is closest to a given input pixel, where “closest” is the minimum distance as computed by the usual distance formula (square root of the sum of the squares of the diffs of all the components).

That means for each pixel in an image, check the pixel against 256 palette entries (early termination is possible if an acceptable threshold is met). As you might imagine, this can be a bit time-consuming. I wondered about a faster approach…

Lookup Table
I think this is the approach that FFmpeg used to use, but I went and derived it for myself after studying the default QuickTime palette table. There’s a pattern there– all of the RGB entries are comprised of combinations of 6 values — 0x00, 0x33, 0x66, 0x99, 0xCC, and 0xFF. If you mix and match these for red, green, and blue values, you come up with 6 * 6 * 6 = 216 different colors. This happens to be identical to the web-safe color palette.

The first (0th) entry in the table is (FF, FF, FF), followed by (FF, FF, CC), (FF, FF, 99), and on down to (FF, FF, 00) when the green component gets knocked down and step and the next color is (FF, CC, FF). The first 36 palette entries in the table all have a red component of 0xFF. Thus, if an input RGB pixel has a red color closest to 0xFF, it must map to one of those first 36 entries.

I created a table which maps indices 0..215 to values from 5..0. Each of the R, G, and B components of an input pixel are used to index into this table and derive 3 indices ri, gi, and bi. Finally, the index into the palette table is given by:

  index = ri * 36 + gi * 6 + bi

For example, the pixel (0xFE, 0xFE, 0x01) would yield ri, gi, and bi values of 0, 0, and 5. Therefore:

  index = 0 * 36 + 0 * 6 + 5

The palette index is 5, which maps to color (0xFF, 0xFF, 0x00).

So I was pretty pleased with myself for coming up with that. Now, ideally, swapping out one algorithm for another in my SMC encoder should yield identical results. That wasn’t the case, initially.

One problem is that the regulation QuickTime palette actually has 40 more entries above and beyond the typical 216-entry color cube (rounding out the grand total of 256 colors). Thus, using the distance approach with the full default table provides for a little more accuracy.

However, there still seems to be a problem. Let’s check our old standby, the Big Buck Bunny logo image:

Distance approach using the full 256-color QuickTime default palette

Distance approach using the 216-color palette

Table lookup approach using the 216-color palette

I can’t quite account for that big red splotch there. That’s the most notable difference between images 1 and 2 and the only visible difference between images 2 and 3.

To prove to myself that the distance approach is equivalent to the table approach, I wrote a Python script to iterate through all possible RGB combinations and verify the equivalence. If you’re not up on your base 2 math, that’s 224 or 16,777,216 colors to run through. I used Python’s multiprocessing module to great effect and really maximized a Core i7 CPU with 8 hardware threads.

So I’m confident that the palette conversion techniques are sound. The red spot is probably attributable to a bug in my WIP SMC encoder.

Source Code
Update August 23, 2011: Here’s the Python code I used for proving equivalence between the 2 approaches. In terms of leveraging multiple CPUs, it’s possibly the best program I have written to date.

14 thoughts on “Basic Video Palette Conversion

  1. Kostya

    That’s why lazy people would simply use existing vector quantisation routines. The only complication is when you want stable palette for a sequence of frames without jitter.

  2. Reimar

    I am quite sure libswscale can still do the old conversion, you just have to set it up differently.
    It never did proper palette conversion though, it just converted to RGB8/BGR8 (so 2 – 3 bits per component) and set the palette properly for that.
    Not sure whether have 4 values for blue and 8 for each red and green is better or worse than 6 for all. Probably a bit worse, particularly since some grey value will get a coloured tint…

  3. Reimar

    I forgot: and advantage would probably be that dithering is a lot easier to do, and I think libswscale should be able to convert to BGR8/RGB8 with dithering.
    Indeed using mplayer with -vf format=bgr8,scale seems to confirm it does so by default (I admit the ordered dither is not the most beautiful though).

  4. Anonymous

    Side note: when you post pre-rendered text (like this distance formula) please turn off subpixel rainbowing.

  5. Multimedia Mike Post author

    @Kostya & Narr: One constraint I need to operate with is fitting within the default QuickTime 256-color palette. QT can of course do custom palettes but that would take extra effort to get that support into the muxer.

    @Reimar: I still get this weird feeling that this wouldn’t work with the QuickTime muxer without a little bit of rework.

  6. Multimedia Mike Post author

    @Anonymous: I wrote up that formula in Google Docs’ word processing component and then captured a screenshot. That must be where any anti-aliasing artifacts manifested. I’m at a loss to perceive them; what sort of video hardware are you viewing on?

  7. Anonymous

    Those are not artifacts. I’m talking about subpixel rendering of the fonts. It depends on subpixel layout of monitor’s matrix, so screenshoted text may look distorted on a different monitor. Also such anti-aliasing looks annoying for some people.

  8. Reimar

    Mike: if you scale up the image or look at it on a different monitor you will see strangely coloured borders.
    I’ve always hated that, I think that the messed-up colour is always visible, even when correctly configured for the specific viewing device. I know quite a few other people who always turn it off for the same reason.

  9. Reimar

    Concerning FFmpeg QT muxer: Yes, you’d have to add support for writing a custom palette.
    However as far as I can tell the mov muxer doesn’t support muxing any paletted (or generally <= 8 bpp) raw formats at all, and I suspect that if paletted compressed formats work that is purely by chance, not because it was implemented properly.

  10. Multimedia Mike Post author

    @Reimar: Setting the bits/per sample to 8 (or 4 or 2 or 1) in the video track’s stsd atom will signal to the decoder that a default palette should be selected, unless a palette is appended to the stsd atom. In this way, FFmpeg/Libav’s current QT muxer supports paletted colorspaces without any modification.

    Yeah, it is a bit “by chance” when it comes right down to it.

  11. Ronald S. Bultje

    You can use K-means to find the ideal palette instead of the default. You can optionally use another the the default “distance” functions than the one you used here to give more weight to green over blue, since blue artifacts are harder to see than green. Then, with this “ideal-per-frame” palette (or ideal-per-movie palette, if you wish), you can use Atkinson-style dither to find palette indexe entries for a source frame and prevent banding artifacts (no dither) or “cross-hatch artifacts” (ordered dither).

    As Reimar alluded earlier, you can use a RGB4 (which is 4R * 8G * 4B) palette or, slightly more fair, a 6R * 7G * 6B default palette which will leave only 4 pixels unused, and spent a few more of the unused ones on green expansion.

Comments are closed.