More On Automated Java De-obfuscation | Breaking Eggs And Making Omelettes

I am not especially proficient in understanding software projects that were written in an excessively object-oriented manner, as languages like C++ and Java encourage a programmer to do. So I had trouble getting my head around the GPL’d source code for the Retroguard Java obfuscator, which I had hoped to subvert into a source code de-obfuscator. Fortunately, Doxygen proved invaluable for generating documentation and hierarchy diagrams that helped illustrate the program’s architecture (I think Javadoc can be used for the same purpose, but I find that Doxygen is easier to install). I think I see where I can hook in to get a basic de-obfuscator.

Retroguard has an abstract NameMaker class that is inherited by KeywordNameMaker and OverloadNameMaker classes. These classes implement the getNextName() method which is responsible for coming up with names like _mthelse(), _mthif(), _mthcase(), and so on. It seems reasonable that, as a first pass, a new NameMaker class could be created that returns more descriptive names. I see that the code cares about method names vs. field names. This could also be thought of as verbs vs. nouns. Perhaps if there were two classes, one with a large dictionary of nouns and the other with verbs, these could output names that would make reverse engineering simpler, at least from a psychological standpoint.

It would still be desirable, ultimately, to modify the code to figure out variable types and prefix, e.g., integers with ‘i_’.