Monthly Archives: September 2007

What RE Looks Like

People ask me what binary reverse engineering is like. It’s tedious, that’s what it is. When it gets right down to it, you just have to concentrate on little (in the sense that they do not do much individually) instructions and carefully trace the bigger picture from huge blocks of these baby steps.

If you want to see what RE really looks like, come inside my notebook. Here are 3 choice pages, illustrating the process I used to figure out one of the inverse transforms that RealVideo 4 (and 3, for that matter; both H.264 prototypes) employ:

page 1, page 2, page 3

In this example, I used a lot of back substitution in order to figure out a series of math formulas. This case was greatly simplified by the fact that there were not very many mystery parameters to deal with. However, the property that complicated matters here is that there were few if any straightforward imul (integer multiplication) instructions. Even though multiplication figures heavily into the transform, most multiplications were performed by sequences of additions, bit shifts, and compound add/shift instructions (lea).

Chocolate Milk Puzzle

So I had a little carton of chocolate milk with my lunch today (longtime readers have guessed that I must still be in elementary school). On one side of the carton was a word search puzzle that challenged me to find 6 words hidden in the jumble of letters. It occurred to me that this represented an allegory for reverse engineering.

I generally liken RE to an extraordinarily complex, or just tedious, puzzle. This is a more straightforward puzzle.

Task: Find 6 strings of characters. What strings? I wasn’t sure exactly. Lesser word searches explicitly tell you which words to find. Those puzzles can be solved via a brute force algorithm that most of us figure out by the age of 5. I knew that these had to be coherent English words. Eventually, I re-read the instructions and noticed that the words were supposed to pertain to “foods made from grain”. Lesson: Gather all the intelligence you can before delving into the meat of the puzzle.

Next, there is domain knowledge. I have done a few word search puzzles before and so I know that I am supposed to look for strings of characters running left -> right, right -> left, top -> bottom, bottom -> top, and also diagonally. And I have some clue about foods made from grain. So I am off.

Then, I started theorizing about how the original puzzler (or whatever you call a person who develops puzzles) must have put this puzzle together. Since there are 6 words, I hypothesized that 2 would be horizontal (one for each direction), 2 would be vertical (again, one for each direction), and 2 would be diagonal. This would make good use of the parameters of a stock word search. I refined my initial search using these assumptions.

I was correct about the 2 horizontal and 2 vertical. I also found one more horizontal word. I methodically scanned every diagonal possibility, much to the disdain of my overworked eyes. I could not find any word on that orientation. So I revised my original assumption and assumed the remaining word must also be horizontal.

Finally, there’s the matter of validation: determining if one’s RE results are correct. Unlike binary RE, this puzzle makes it easy. So I check and… I don’t believe it! I only got 4 of them correct. I got another partially correct. And yeah, I didn’t think ‘bean’ was a valid grain.

How Many Multiplications?

While many are still dogged by the quandary of how many licks it takes to get to the chewy center of a certain popular confection, I found myself wondering how many multiplication operations it takes to decode a frame of certain video codecs. Multiplication tends to be one of the more intensive parts of video decoding and much effort is spent minimizing the number of multiplications that are absolutely necessary. If my numbers and assumptions are correct, a 320×240 VP3/Theora intraframe requires 747,680 multiplications to decode. It goes something like this:

Each 8×8 block of the image needs to be transformed and dequantized. How many blocks are in the image?

 Y plane: (320 * 240) pixels / (8 * 8) pixels/block = 1200 blocks
 U plane: (160 * 120) pixels / (8 * 8) pixels/block =  300 blocks
 V plane: (160 * 120) pixels / (8 * 8) pixels/block =  300 blocks

So there are 1800 blocks in the frame. Each block needs to be run through an inverse discrete cosine transform (IDCT). Each of the 64 samples requires 64 mults. That means each block requires 64² mults. However, optimizations (usually by factoring out redundant mults) result in a transform that can be performed with N*log₂(N) mults.

  64 * log₂(64) = 384 mults/block

This assumes that all of the blocks in the intraframe have enough non-zero coefficients that more optimal IDCTs (like 1-, 3-, and 10-element) are not warranted. Then there is the matter of dequantization. Assuming an average of 16 non-zero coefficients in need of dequantization, the number of mults per block rises to 400.

 400 mults/block * 1800 blocks/frame = 720000 mults/frame

The same number would hold true for JPEG images or MPEG-1 intraframes, given the same (possibly fraudulent) assumptions. I am not sure if 16 non-zero coefficients per block is too generous and too conservative an estimate. However, the vast majority of the blocks in a VP3/Theora intraframe will also require 3 multiplications in order to reverse the DC prediction. So I’ll just wave my hand and factor DC prediction in with the dequantization budget of 16 mults/block.

Now, if we want to deploy the recommended post processing method, that requires 8 more multiplications per edge, where ‘edge’ is a boundary between 2 blocks that have been rendered in this frame (and all blocks are rendered for an intraframe). How many edges are in the frame?

vertical edges = (width_in_blocks – 1) * (height_in_blocks)
horizontal edges = (width_in_blocks) * (height_in_blocks – 1)

 Y plane: 40x30 blocks, (40 - 1) * 30 + 40 * (30 - 1) = 1170 + 1160 = 2330 edges
 U plane: 20x15 blocks, (20 - 1) * 15 + 20 * (15 - 1) =  285 +  280 =  565 edges
 V plane: 20x15 blocks, (20 - 1) * 15 + 20 * (15 - 1) =  285 +  280 =  565 edges
 
 3460 edges/frame * 8 PP mults/edge = 27680 PP mults/frame

Thus:

 720000 mults/frame + 27680 mults/frame = 747680 mults/frame

It should be noted that VP3/Theora’s method for post-processing involves a 4-tap filter with coefficients of (1, -1, 3, -3). Trivially, the multiplication by 3 can be factored out (which is how the assumption of one mult per sample, or 8 per edge, is derived). Further, it can easily be broken down to a bit shift and addition (3*n = (n << 1) + n) so that the post processing phase does not strictly require any multiplications. Naturally, the parameters for a nominal VP3/Theora interframe vary wildly and it's a little harder to produce an average number of multiplications for that case. I'm sure someone will be along in the comments if I messed up the math or, more likely, the assumptions.

Silverlight Codecs

Today, Microsoft released some framework for delivering rich internet applications called Silverlight. One of its larger selling points is the ability to stream HD-quality multimedia over the internet. Naturally, I wonder what codecs the system uses. Various press releases play up the VC-1 codec, which of course only pertains to video. I can only assume that WMA3 would be a standard audio codec, and likely WMA2.

But is that all? I started asking this question after I wondered how or if someone could possibly create a Silverlight YouTube clone. The secret to YouTube’s success is, of course, free software. The site runs FFmpeg and MEncoder (see my previous post on this matter) on top of free operating systems and web servers (and I read a forum post somewhere that alleges that Python scripts schedule the multimedia conversion jobs). Sure, you could build the same kind of site with purely Microsoft OSes and multimedia conversion software, but it would cost lots of money in software licenses. This is money that doesn’t need to be spent in, e.g., YouTube’s case and can go towards expanding the hardware infrastructure or lining executive pockets.

Maybe it will be possible to build a Silverlight-based YouTube after all. Here’s a post in a Silverlight forum that describes what codecs Silverlight is alleged to support: On the video side, there is WMV1, WMV2, WMV3, WMVA, and something called WMVC1 (not a proper FourCC): Windows Media Video Advanced Profile. For audio, the post lists WMA 1, 2, and 3, and MP3. Free software encoding support for MP3 is quite abundant. FFmpeg also has support for encoding WMA2. Out of that list of video codecs, FFmpeg can encode WMV7 and 8 (no XIntra8-type frames, of course, but the encoded streams are still compliant).

As is my custom, I have started a MultimediaWiki page tracking what is known about the multimedia formats supported in Microsoft Silverlight. It would be nice to put together a simple Silverlight platform to empirically test the capabilities. I just can’t get past thinking that Silverlight should also support raw PCM, though that’s not as sexy as MP3 for mentioning in a press release. Fairness dictates I should do the same for Adobe’s Flash Player (Silverlight’s obvious competitor), so I have started an appropriate MultimediaWiki page on that topic.

Meanwhile, there’s that Moonlight project offshoot from the Mono Project that is supposed to provide Linux support for the new hotness that is Silverlight. For some reason, the meme going around today is that MS Silverlight is officially out, and Linux has 100% perfect support (or is right around the corner). My cursory investigation leads me to believe that this is not the case. I have read stuff about the Silverlight install for Linux being a one-click affair. I have some direct experience with that type of goal so I am a tad dubious. My guess is that the one-click thing is supposed to be reliable for SuSE Linux (Microsoft partner Novell’s distribution) and hopefully adaptable, with modifications, to other distros. But based on the official status page, it’s not quite there.

But thinking about my primary obsession, multimedia codecs– how will Moonlight cope with decoding the aforementioned formats? The project status page recommends developers and early adopters download and install a particular SVN revision of FFmpeg. That can’t be relied upon to decode all the formats called for above. I have read some rumblings about how Microsoft will release binary codecs that will only be allowed to run with browser plugins. So there’s something to look forward in the binary-supporting components of the free Linux multimedia players.

I should probably mention that I help develop a program that operates in a similar space as Silverlight.

Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering

Monthly Archives: September 2007

What RE Looks Like

Chocolate Milk Puzzle

How Many Multiplications?

Silverlight Codecs