Yearly Archives: 2007

Breaking Eggs And Making Omelettes: The Paper

I presented a paper at LinuxTag in 2005, nearly 2 years ago. I can’t believe I never got around to posting it. Here it is in PDF format, named after this blog: “Breaking Eggs and Making Omelettes: Intelligence Gathering For Open Source Software Development.” It’s a pretty high level overview of the RE craft written mostly for those who have not had much exposure.

linuxtag-2005-re-paper.pdf (~100 KB)

ASM: What It Can Do For You

I’ve had my nose deep in the multimedia mess for so long that I forget that people outside the field might not understand some of the ideas that those of us “skilled in the art” believe are intuitive. One such concept is that of manually programming in assembly language (colloquially referred to as ASM).

“But no one codes in ASM these days,” they exclaim. “Compilers are smart enough to do all the optimization you will ever need.” First, I would like to state that I place very little faith in compilers to always do the right thing, particularly during the optimization phase, but that’s neither here nor there. Allow me to explain why manual ASM programming is often done during multimedia programming.

First, it’s true: very few people write actual, usable applications purely in ASM. Where ASM comes in handy is in optimization in the form of invoking series of special instructions in ways that compilers wouldn’t be able to leverage.

That’s the short answer about why people still manually code some program components in ASM these days. The rest of this post will be devoted to explaining exactly how certain special instructions are useful in multimedia applications.

A lot of modern consumer-level CPUs (x86_32, x86_64, PPC) have special categories of instructions that are designed to perform the same operation on lots of data bits, all at the same time. Remember when Intel rolled out the Pentium MMX instructions? That’s what those were good for– performing lots of the same operation in parallel. Since then, there have been various upgrades to the concept, and from various other manufacturers (and some branching out, such as special floating point operations as well as cache control instructions). But the main idea that is useful in multimedia programming remains the same and it’s called “single instruction, multiple data” or SIMD; you execute one instruction and it operates on multiple data in parallel.

SIMD instructions are invaluable in multimedia programming because processing multimedia typically involves performing the same operation on every data element in a huge buffer. If you can do the same operation on 2 data elements, or 4, or 8, or 16 at a time, rather than just one at a time, that’s great.

Here’s a more tangible example: You have a large array of bytes. You have to add the quantity 5 to each byte. With the traditional load-do_math-store sequence, this looks like:

  load next data byte into reg1
  reg1 = reg1 + 5
  store reg1 back to memory
  [repeat for each byte]

Even if you have a 32-bit CPU, you can’t effectively load 4 data bytes at a time and add 0x05050505 because if any individual sums exceed 255, the carry will spill into the next most significant byte. However, the Intel MMX provides 8 64-bit registers (really just the same as the FP regs, but that’s trivia). Depending on the SIMD instruction issued, a register may be treated as 8 8-bit elements, 4 16-bit elements, or 2 32-bit elements. Check out the parallel approach for solving the previous problem:

  init mm1 (mmx register) as 0x0505050505050505
  load next 8 bytes into mm2
  mm2 = mm2 + mm1, treat data as 8 unsigned bytes
  store 8 bytes back to memory
  [repeat for each block of 8 bytes]

Powerful, huh? You’re not performing one addition at a time; rather, there are 8 happening in parallel. That’s something that a compiler generally can’t pick up on and optimize for you behind the scene (though I’ve heard legends that Intel’s compiler can perform such feats, which I would like to see). But there’s more:

  • Intel’s SSE2 and the PowerPC’s AltiVec SIMD facilities provide 128-bit registers. Thus, this same process can operate on 16 bytes in parallel.
  • The data doesn’t have to be treated as unsigned bytes; you can also ask it to be treated as signed bytes, or signed or unsigned words or doublewords. It all depends on the instruction issued

There’s also saturation. This goes by many names such as clamping and clipping. In the above example, if we want to make sure that none of the bytes go above 255 (and wrap around to 0 as a result), we must perform this tedious process for each byte:

  next_byte = *buffer;
  if (next_byte + 5 > 255)
    next_byte = 255;
  else
    next_byte += 5;
  *buffer++ = next_byte;

Not so with SIMD instructions which typically have saturation modes. When you invoke the saturation variant of the add instruction, it automatically makes sure that none of the data elements go above the limit (255 for an unsigned byte or 127 for a signed byte).

What would be the practical application of adding a constant value to each byte in a buffer? This is one of the earliest examples I found related to SIMD programming. The example pertained to brightening the intensity information of an image. This is particularly pertinent to me since I remember working briefly with a image manipulation program that did not take saturation into account when brightening an image. It simply wrapped the values right around to 0 and kept adding. As demonstrated, using hand-coded SIMD ASM in this case would have yielded both correct and fast results.

What about applications in multimedia video codecs? The parallel saturated addition happens to come in handy in such situations. There are many mainstream video codecs which operate on 8×8 blocks of samples. Further, they have to decode one block and then add that block to a block which occurs in the previous frame. This turns out to be a fantastic use for parallel addition with saturation.

These are but a few simple applications for hand-coded ASM code. I may write more on the matter, if I am so inspired.

Dynamic Uninteresting Movie-Based Adventure System Simulator

I’ve been suffering through a wave of interactive movie schlock for my Gaming Pathology project. This has led me to hypothesize about what I can do to help share these wretched I-movie experiences with a broader audience and even preserve this misery for future generations to revile. To that end, I have brainstormed about the Dynamic Uninteresting Movie-Based Adventure System Simulator (thanks to Cyril Z. for helping me with the expository project name).

Loosely, an I-movie is a “game” that relies heavily on pre-rendered full motion video (FMV) sequences. The sequences can be used to show transitions from one location to another or — more often than not — exhibit a marginal actor disgracing his chosen craft in order to advance the sheer confusion that passes for a storyline. Many of these so-called games also feature one or more puzzles so as not to rely entirely on one type of gameplay.

After playing through a number of these I-movie titles, I can’t help but notice certain programmatic similarities. For example, games like D and Of Light And Darkness feign immersive 3D environments with a combination of FMV files. Start at point A. Define a series of hotspots for the current scene that map to other FMV files. When the player clicks in one of those hotspots, play the next FMV file. There; that’s 90% of the game engine right there.

Internally, the games probably have a little of what might be termed a virtual machine in order to track the game’s state. At least, I postulate that it could be forced into some kind of virtual machine structure. This is used for tracking how far the player has progressed into the game, which hoops he has jumped through, and, by extension, what special things need to happen on certain screens. this would also pertain to various puzzles which are typically comprised of series of FMV files.

So here’s the pitch: A portable virtual machine in the spirit of ScummVM that knows how to interpret the data files for a variety of these I-movies and force them into a common model. This would probably entail a graph data structure describing a map and which FMV files get played when the player chooses to transition from point A to point B. Further, there would be some list of game goals to progress through. Most of these I-movies couldn’t possibly be much more involved than that. From my understanding, most of the FMV formats that these games use are already well supported by portable, open source software.

It’s just crazy enough to work. And to what end? That should be obvious– to continue humiliating the people responsible for these tarnishes on the good name of computer gaming.

VMware Video Specs

Many thanks to VMware for contributing to the existing MultimediaWiki description for their own VMware Video codec. This is a a lossless codec used for efficiently and losslessly capturing action from virtualized computer desktops running under VMware’s Workstation product. The community had worked out a number of the codec details but VMware people have filled in the gaps.