Category Archives: Reverse Engineering

Brainstorming and case studies relating to craft of software reverse engineering.

Vedanti and Max Sound vs. Google

Vedanti Systems Limited (VSL) and Max Sound Coporation filed a lawsuit against Google recently. Ordinarily, I wouldn’t care about corporate legal battles. However, this one interests me because it’s multimedia-related. I’m curious to know how coding technology patents might hold up in a real court case.

Here’s the most entertaining complaint in the lawsuit:

Despite Google’s well-publicized Code of Conduct — “Don’t be Evil” — which it explains is “about doing the right thing,” “following the law,” and “acting honorably,” Google, in fact, has an established pattern of conduct which is the exact opposite of its claimed piety.

I wonder if this is the first known case in which Google has been sued over its long-obsoleted “Don’t be evil” mantra?

Researching The Plaintiffs
Continue reading

Reverse Engineering Italian Literature

Some time ago, Diego “Flameeyes” Pettenò tried his hand at reverse engineering a set of really old CD-ROMs containing even older Italian literature. The goal of this RE endeavor would be to extract the useful literature along with any structural metadata (chapters, etc.) and convert it to a more open format suitable for publication at, e.g., Project Gutenberg or Archive.org.

Unfortunately, the structure of the data thwarted the more simplistic analysis attempts (like inspecting for blocks of textual data). This will require deeper RE techniques. Further frustrating the effort, however, is the fact that the binaries that implement the reading program are written for the now-archaic Windows 3.1 operating system.

In pursuit of this RE goal, I recently thought of a way to glean more intelligence using DOSBox.

Prior Work
There are 6 discs in the full set (distributed along with 6 sequential issues of a print magazine named L’Espresso). Analysis of the contents of the various discs reveals that many of the files are the same on each disc. It was straightforward to identify the set of files which are unique on each disc. This set of files all end with the extension “LZn”, where n = 1..6 depending on the disc number. Further, the root directory of each disc has a file indicating the sequence number (1..6) of the CD. Obviously, these are the interesting targets.

The LZ file extensions stand out to an individual skilled in the art of compression– could it be a variation of the venerable LZ compression? That’s actually unlikely because LZ — also seen as LIZ — stands for Letteratura Italiana Zanichelli (Zanichelli’s Italian Literature).

The Unix ‘file’ command was of limited utility, unable to plausibly identify any of the files.

Progress was stalled.

Saying Hello To An Old Frenemy
I have been showing this screenshot to younger coworkers to see if any of them recognize it:


DOSBox running Window 3.1

Not a single one has seen it before. Senior computer citizen status: Confirmed.

I recently watched an Ancient DOS Games video about Windows 3.1 games. This episode showed Windows 3.1 running under DOSBox. I had heard this was possible but that it took a little work to get running. I had a hunch that someone else had probably already done the hard stuff so I took to the BitTorrent networks and quickly found a download that had the goods ready to go– a directory of Windows 3.1 files that just had to be dropped into a DOSBox directory and they would be ready to run.

Aside: Running OS software procured from a BitTorrent network? Isn’t that an insane security nightmare? I’m not too worried since it effectively runs under a sandboxed virtual machine, courtesy of DOSBox. I suppose there’s the risk of trojan’d OS software infecting binaries that eventually leave the sandbox.

Using DOSBox Like ‘strace’
strace is a tool available on some Unix systems, including Linux, which is able to monitor the system calls that a program makes. In reverse engineering contexts, it can be useful to monitor an opaque, binary program to see the names of the files it opens and how many bytes it reads, and from which locations. I have written examples of this before (wow, almost 10 years ago to the day; now I feel old for the second time in this post).

Here’s the pitch: Make DOSBox perform as strace in order to serve as a platform for reverse engineering Windows 3.1 applications. I formed a mental model about how DOSBox operates — abstracted file system classes with methods for opening and reading files — and then jumped into the source code. Sure enough, the code was exactly as I suspected and a few strategic print statements gave me the data I was looking for.

Eventually, I even took to running DOSBox under the GNU Debugger (GDB). This hasn’t proven especially useful yet, but it has led to an absurd level of nesting:


GDB runs DOSBox runs Windows 3.1

The target application runs under Windows 3.1, which is running under DOSBox, which is running under GDB. This led to a crazy situation in which DOSBox had the mouse focus when a GDB breakpoint was triggered. At this point, DOSBox had all desktop input focus and couldn’t surrender it because it wasn’t running. I had no way to interact with the Linux desktop and had to reboot the computer. The next time, I took care to only use the keyboard to navigate the application and trigger the breakpoint and not allow DOSBox to consume the mouse focus.

New Intelligence
Continue reading

Deobfuscation Redux: JavaScript

Google recently released version 12 of their Chrome browser. This version adds a new feature that automatically allows deobfuscating obfuscated JavaScript source code.

Before:



After:



As a reverse engineering purist, I was a bit annoyed. Not at the feature, just the naming. This is clearly code beautification but not necessarily deobfuscation. The real obfuscation comes not from removing whitespace but from renaming variable and function names to terse 1- and 2-letter identifiers. True automated deobfuscation — which entails recovering the original variable and function identifiers as well as source code comments — is basically impossible.

Still, it makes me wonder if there is any interest in a JavaScript deobfuscator that operates similar to my Java deobfuscator which was one of the first things I published on this blog. The general idea is automatically replace function names with random English verbs (since functions correspond to actions) and variable names with random animal names (I decided “English nouns” encompassed too broad a category of words). I suspect the day that someone releases a proprietary multimedia codec in a pure (though obfuscated) JavaScript format is that day that I will try to accomplish this, if it hasn’t been done already.

See also:

Internecine Legal Threats

FFmpeg and associated open source multimedia projects such as xine, MPlayer, and VLC have long had a rebel mystique about them; a bunch of hackers playing fast and loose with IP law in order to give the world the free multimedia experience it deserved. We figured out the algorithms using any tools available, including the feared technique of binary reverse engineering. When I gave a presentation about FFmpeg at Linuxtag in 2007, I created this image illustrating said mystique:



It garnered laughs. But I made the point that we multimedia hackers just press on, doing our thing while ignoring legal threats. The policy has historically worked out famously for us– to date, I seem to be the only person on the receiving end of a sort-of legal threat from the outside world.

Who would have thought that the most credible legal threat to an open source multimedia project would emanate from a fork of that very project? Because that’s exactly what has transpired:



Click for full threat

So it came to pass that Michael Niedermayer — the leader of the FFmpeg project — received a bona fide legal nastygram from Mans Rullgard, a representative of the FFmpeg-forked Libav project. The subject of dispute is a scorched-earth matter involving the somewhat iconic FFmpeg zigzag logo:

   
Original 2D logo enhanced 3D logo

To think of all those years we spent worrying about legal threats from organizations outside the community. I’m reminded of that time-honored horror trope/urban legend staple: Get out! The legal threats are coming from inside the house!

I’m interested to see how this all plays out, particularly regarding jurisdiction, as we have a U.K. resident engaging an Italian lawyer outfit to deliver a legal threat to an Austrian citizen regarding an image hosted on a server in Hungary. I suspect I know why that law firm was chosen, but it’s still a curious jurisdictional setup.

People often used to ask me if we multimedia hackers would get sued to death for doing what we do. My response was always, “There’s only one way to know for sure,” by which I meant that we would just have to engage in said shady activities and determine empirically if lawsuits resulted. So I’m a strong advocate for experimentation to push the limits. Kudos to Michael and Mans for volunteering to push the legal limits.