Moving From Java Back To Coffee Beans

The Jad Java Decompiler has a wonderful logo:

Jad Logo

Oh Java, for so long I wished I would not have to deal with you in any meaningful way. Now, I welcome your bloated, verbose, object-glorifying code. What changed?

For years I was rather ambivalent about the Java programming language. I left it alone as long as it left me alone. I had very little reason to care about the language. That all changed last year when a colleague notified me that certain multimedia technology companies were actually porting their closed formats to Java.

Why is this important? Because compiled Java classes are ridiculously simple to reverse engineer. Of course, this assertion is relative to my experience in RE’ing C/C++ code that has been compiled to Intel i386 machine code.

Naysayers shall claim that responsible software companies will run their Java source files through a code obfuscator before compiling them into class files. Indeed, On2 uses an obfuscator named Retroguard. It’s fiendishly good, too. But it can only do so much.

There are generally 2 huge challenges when disassembling and RE’ing machine code:

  1. Understanding the original code flow and structure
  2. Decoding the data identifiers (guessing at the names of the variable and function names)

Check this out: Step 1 is rendered unnecessary with decompiled Java classes; the class files retain most of the code-flow knowledge from the original files. Even switch-case blocks are decompiled cleanly. Anyone who has tried to decompile a switch-case or an optimized if-then-else sequence compiled from C knows how much of a relief this is.

That just leaves the identifier guessing, assuming the class creators had the presence of mind to run a code obfuscator. This quickly turns into a common type of puzzle known as a cryptogram. You figure out the obvious identifiers. All class files need to have at least some human-readable public identifiers that the outside world can call, e.g. ‘DecodeFrame()’. Then you use those as clues to figure out less obvious identifiers. Combined with domain knowledge of your target, it should only be a matter of time.