Back in 2000, I came across this Advogato article about proper coding guidelines for the coming wave of 64-bit machines. The most interesting part, I thought, was comment #2 (“C is portable, if you let it be”) which offers some very sane guidelines for declaring variable types to just allow the compiler to do its job effectively. This is why I usually just declare int’s for numbers rather than uint32_t’s everywhere. There is often no reason to try to force particular types.
Don’t think that you’re saving space by declaring a uint8_t instead of an int– chances are that you aren’t. I’ve disassembled enough C code compiled into 32-bit x86 machine code to know that a compiler will usually allocate 32 bits for that 8-bit variable. In fact, here is a small piece of code to drive the point home:
Compile with: gcc -Wall stack.c -o stack
Disassemble with: objdump -d -Mintel stack
080483a0 < main >:
80483a0: 55 push ebp
80483a1: 89 e5 mov ebp,esp
80483a3: 83 ec 08 sub esp,0x8
80483a6: 83 e4 f0 and esp,0xfffffff0
80483a9: b8 00 00 00 00 mov eax,0x0
80483ae: 29 c4 sub esp,eax
80483b0: e8 07 ff ff ff call 80482bc < random @plt >
80483b5: 88 45 ff mov BYTE PTR [ebp-1],al
80483b8: 66 0f be 45 ff movsx ax,BYTE PTR [ebp-1]
80483bd: 40 inc eax
80483be: 66 89 45 fc mov WORD PTR [ebp-4],ax
Notice that, despite strictly needing only 3 bytes of local variable storage, 8 bytes were allocated from the stack. 32-bit machines like the i386 really, really like dealing with 32-bit quantities.
When I started mounting GameCube disc images with my gcfuse utility, perhaps the strangest thing I found (apart from 11,500+ Ogg Vorbis files on one title) was intact CVS directory structures on a number of discs. Of course, CVS directories don’t give away much interesting detail; it’s not like there’s leaked source code living inside. Perhaps the most interesting thing is comparing the CVSROOT strings to information found in the MobyGames database. So, Yasunari Soejima, Hiroki Sotoike, and Fumihisa Sato: I just wanted to helpfully point out that you neglected to delete the CVS directories before creating the final disc images for certain GameCube games.
Pursuant to my last post of black box reverse engineering, a piece of sliverware (hi spoon!) emailed me and tipped me off to various programs hosted at a site called OpenRCE. The RCE stands for reverse code engineering. Nice logo, too:
I do appreciate it when people clue me into other resources out there dedicated to the fine art of reverse engineering. The articles and utilities hosted at OpenRCE appear to be a little more focused towards understanding malware which is a domain with somewhat different characteristics than multimedia, but certainly no less challenging.
In particular, my attention was directed to a Python-based RE framework called Pai Mei as well as another tool named Process Stalker.
Reverse engineering an algorithm from binary code is tough enough. However, there is a larger issue of validation. One idea I have been thinking about for awhile is some method of hooking into an RE target during runtime and trapping data as it goes in and out of a particular function. The collected data would later be used as test vectors for the new implementation. However, it also occurred to me that this method could also be the RE tool itself. For example, if you are pretty sure that a particular piece of binary code operates as an inverse DCT, use the previously described method to observe data coming in and out. This can save you some trouble of tracing through a tedious stretch of code to determine that it actually is an IDCT. Plus, you can figure out if it is identical to, e.g., the standard MPEG/JPEG IDCT.
Colleague Benjamin Larsson noted that this would be referred to as black box reverse engineering.
Another basic application of this technique would be to monitor the bitstream parsing function for a given input bitstream. Many multimedia decoders delegate all of their bitstream parsing duties to a small number of functions and this would be a great way to validate that a new decoder is chopping up a bitstream in the correct manner.
How to accomplish this? I recently sat down and actually read through the entire GNU Debugger manual to see what interesting features I might have been missing all these years. I discovered tracepoints. These apparently let you gather data about a program without stopping the program periodically. Unfortunately, I don’t think the facility is flexible enough to do what I outline above.
Are there tools that can already do what is described here? Or will it take some custom tools? If it takes custom tools, I already have a head start with some of my experiments.