Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


Archives:

Reverse Engineering Italian Literature

June 30th, 2014 by Multimedia Mike

Some time ago, Diego “Flameeyes” Pettenò tried his hand at reverse engineering a set of really old CD-ROMs containing even older Italian literature. The goal of this RE endeavor would be to extract the useful literature along with any structural metadata (chapters, etc.) and convert it to a more open format suitable for publication at, e.g., Project Gutenberg or Archive.org.

Unfortunately, the structure of the data thwarted the more simplistic analysis attempts (like inspecting for blocks of textual data). This will require deeper RE techniques. Further frustrating the effort, however, is the fact that the binaries that implement the reading program are written for the now-archaic Windows 3.1 operating system.

In pursuit of this RE goal, I recently thought of a way to glean more intelligence using DOSBox.

Prior Work
There are 6 discs in the full set (distributed along with 6 sequential issues of a print magazine named L’Espresso). Analysis of the contents of the various discs reveals that many of the files are the same on each disc. It was straightforward to identify the set of files which are unique on each disc. This set of files all end with the extension “LZn”, where n = 1..6 depending on the disc number. Further, the root directory of each disc has a file indicating the sequence number (1..6) of the CD. Obviously, these are the interesting targets.

The LZ file extensions stand out to an individual skilled in the art of compression– could it be a variation of the venerable LZ compression? That’s actually unlikely because LZ — also seen as LIZ — stands for Letteratura Italiana Zanichelli (Zanichelli’s Italian Literature).

The Unix ‘file’ command was of limited utility, unable to plausibly identify any of the files.

Progress was stalled.

Saying Hello To An Old Frenemy
I have been showing this screenshot to younger coworkers to see if any of them recognize it:


DOSBox running Window 3.1

Not a single one has seen it before. Senior computer citizen status: Confirmed.

I recently watched an Ancient DOS Games video about Windows 3.1 games. This episode showed Windows 3.1 running under DOSBox. I had heard this was possible but that it took a little work to get running. I had a hunch that someone else had probably already done the hard stuff so I took to the BitTorrent networks and quickly found a download that had the goods ready to go– a directory of Windows 3.1 files that just had to be dropped into a DOSBox directory and they would be ready to run.

Aside: Running OS software procured from a BitTorrent network? Isn’t that an insane security nightmare? I’m not too worried since it effectively runs under a sandboxed virtual machine, courtesy of DOSBox. I suppose there’s the risk of trojan’d OS software infecting binaries that eventually leave the sandbox.

Using DOSBox Like ‘strace’
strace is a tool available on some Unix systems, including Linux, which is able to monitor the system calls that a program makes. In reverse engineering contexts, it can be useful to monitor an opaque, binary program to see the names of the files it opens and how many bytes it reads, and from which locations. I have written examples of this before (wow, almost 10 years ago to the day; now I feel old for the second time in this post).

Here’s the pitch: Make DOSBox perform as strace in order to serve as a platform for reverse engineering Windows 3.1 applications. I formed a mental model about how DOSBox operates — abstracted file system classes with methods for opening and reading files — and then jumped into the source code. Sure enough, the code was exactly as I suspected and a few strategic print statements gave me the data I was looking for.

Eventually, I even took to running DOSBox under the GNU Debugger (GDB). This hasn’t proven especially useful yet, but it has led to an absurd level of nesting:


GDB runs DOSBox runs Windows 3.1

The target application runs under Windows 3.1, which is running under DOSBox, which is running under GDB. This led to a crazy situation in which DOSBox had the mouse focus when a GDB breakpoint was triggered. At this point, DOSBox had all desktop input focus and couldn’t surrender it because it wasn’t running. I had no way to interact with the Linux desktop and had to reboot the computer. The next time, I took care to only use the keyboard to navigate the application and trigger the breakpoint and not allow DOSBox to consume the mouse focus.

New Intelligence
Read the rest of this entry »

Posted in Reverse Engineering | 6 Comments »

Deobfuscation Redux: JavaScript

June 19th, 2011 by Multimedia Mike

Google recently released version 12 of their Chrome browser. This version adds a new feature that automatically allows deobfuscating obfuscated JavaScript source code.

Before:



After:



As a reverse engineering purist, I was a bit annoyed. Not at the feature, just the naming. This is clearly code beautification but not necessarily deobfuscation. The real obfuscation comes not from removing whitespace but from renaming variable and function names to terse 1- and 2-letter identifiers. True automated deobfuscation — which entails recovering the original variable and function identifiers as well as source code comments — is basically impossible.

Still, it makes me wonder if there is any interest in a JavaScript deobfuscator that operates similar to my Java deobfuscator which was one of the first things I published on this blog. The general idea is automatically replace function names with random English verbs (since functions correspond to actions) and variable names with random animal names (I decided “English nouns” encompassed too broad a category of words). I suspect the day that someone releases a proprietary multimedia codec in a pure (though obfuscated) JavaScript format is that day that I will try to accomplish this, if it hasn’t been done already.

See also:

Posted in Reverse Engineering | 4 Comments »

Internecine Legal Threats

June 1st, 2011 by Multimedia Mike

FFmpeg and associated open source multimedia projects such as xine, MPlayer, and VLC have long had a rebel mystique about them; a bunch of hackers playing fast and loose with IP law in order to give the world the free multimedia experience it deserved. We figured out the algorithms using any tools available, including the feared technique of binary reverse engineering. When I gave a presentation about FFmpeg at Linuxtag in 2007, I created this image illustrating said mystique:



It garnered laughs. But I made the point that we multimedia hackers just press on, doing our thing while ignoring legal threats. The policy has historically worked out famously for us– to date, I seem to be the only person on the receiving end of a sort-of legal threat from the outside world.

Who would have thought that the most credible legal threat to an open source multimedia project would emanate from a fork of that very project? Because that’s exactly what has transpired:



Click for full threat

So it came to pass that Michael Niedermayer — the leader of the FFmpeg project — received a bona fide legal nastygram from Mans Rullgard, a representative of the FFmpeg-forked Libav project. The subject of dispute is a scorched-earth matter involving the somewhat iconic FFmpeg zigzag logo:

   
Original 2D logo enhanced 3D logo

To think of all those years we spent worrying about legal threats from organizations outside the community. I’m reminded of that time-honored horror trope/urban legend staple: Get out! The legal threats are coming from inside the house!

I’m interested to see how this all plays out, particularly regarding jurisdiction, as we have a U.K. resident engaging an Italian lawyer outfit to deliver a legal threat to an Austrian citizen regarding an image hosted on a server in Hungary. I suspect I know why that law firm was chosen, but it’s still a curious jurisdictional setup.

People often used to ask me if we multimedia hackers would get sued to death for doing what we do. My response was always, “There’s only one way to know for sure,” by which I meant that we would just have to engage in said shady activities and determine empirically if lawsuits resulted. So I’m a strong advocate for experimentation to push the limits. Kudos to Michael and Mans for volunteering to push the legal limits.

Posted in Legal/Ethical | 11 Comments »

Reverse Engineering Radius VideoVision

April 2nd, 2011 by Multimedia Mike

I was called upon to help reverse engineer an old video codec called VideoVision (FourCC: PGVV), ostensibly from a company named Radius. I’m not sure of the details exactly but I think a game developer has a bunch of original FMV data from an old game locked up in this format. The name of the codec sounded familiar. Indeed, we have had a sample in the repository since 2002. Alex B. did some wiki work on the codec some years ago. The wiki mentions that there existed a tool to transcode PGVV data into MJPEG-B data, which is already known and supported by FFmpeg.

The Software
My contacts were able to point me to some software, now safely archived in the PGVV samples directory. There is StudioPlayer2.6.2.sit.hqx which is supposed to be a QuickTime component for working with PGVV data. I can’t even remember how to deal with .sit or .hqx data. Then there is RadiusVVTranscoder101.zip which is the tool that transcodes to MJPEG-B.

Disassembling for Reverse Engineering
Since I could actually unpack the transcoder, I set my sights on that. Unpacking the archive sets up a directory structure for a component. There is a binary called RadiusVVTranscoder under RadiusVVTranscoder.component/Contents/MacOS/. Basic deadlisting disassembly is performed via ‘otool’ as shown:

  otool -tV RadiusVVTranscoder | c++filt

This results in a deadlisting of both PowerPC and 32-bit x86 code, as the binary is a “fat” Mac OS X binary designed to run on both architectures. The command line also demangles C++ function signatures which gives useful insight into the parameters passed to a function.

Pretty Pictures
The binary had a lot of descriptive symbols. As a basis for reverse engineering, I constructed call graphs using these symbols. Here are the 2 most relevant portions (click for larger images).

The codec initialization generates Huffman tables relevant to the codec:



The main decode function calls AddMJPGFrame which apparently does the heavy lifting for the transcode process:



Based on this tree, I’m guessing that luma blocks can be losslessly transcoded (perhaps with different Huffman tables) which chroma blocks may rely on a different quantization method.

Assembly Constructs
I started looking at the instructions (the x86 ones, of course). The binary uses a calling convention I haven’t seen before, at least not for the x86: Rather than pushing function arguments onto the stack, the code manually subtracts, e.g., 12 from the ESP register, loads 3 32-bit arguments into memory relative to ESP, and then proceeds with the function call.

I’m also a little unclear on constructs such as “call ___i686.get_pc_thunk.bx” seen throughout relevant functions such as MakeRadiusQuantizationTables().

I’m just presenting what I have so far in case anyone else wants to try their hand.

Posted in Reverse Engineering | 10 Comments »

WebM Cabal

October 7th, 2010 by Multimedia Mike

I traveled to a secret clubhouse today to take part in a clandestine meeting to discuss exactly how WebM will rule over all that you see and hear on the web. I can’t really talk about it. But I can show you the cool hat I got:



Yeah, you’re jealous.

The back of the hat has an Easter egg for video codec nerds– the original Duck Corporation logo (On2′s original name):



Former employees of On2 (now Googlers) were well-represented. It was an emotional day of closure as I met the person — the only person to date — who contacted me with a legal threat so many years ago. He still remembered me too.

I met a lot of people involved in creating various Duck and On2 codecs and learned a lot of history and lore behind then– history I hope to be able to document one day.

I’m glad I got that first rough draft of a toy VP8 encoder done in time for the meeting. It was the subject of much mirth.

Posted in On2/Duck, VP8 | 6 Comments »

Resurrecting SCD

August 5th, 2010 by Multimedia Mike

When I became interested in reverse engineering all the way back in 2000, the first Win32 disassembler I stumbled across was simply called “Win32 Program Disassembler” authored by one Sang Cho. I took to calling it ‘scd’ for Sang Cho’s Disassembler. The original program versions and source code are still available for download. I remember being able to compile v0.23 of the source code with gcc under Unix; 0.25 is no go due to extensive reliance on the Win32 development environment.

I recently wanted to use scd again but had some trouble compiling. As was the case the first time I tried compiling the source code a decade ago, it’s necessary to transform line endings from DOS -> Unix using ‘dos2unix’ (I see that this has been renamed to/replaced by ‘fromdos’ on Ubuntu).

Beyond this, it seems that there are some C constructs that used to be valid but are now rejected by gcc. The first looks like this:

  1. return  (int) c = *(PBYTE)((int)lpFile + vCodeOffset);

Ahem, “error: lvalue required as left operand of assignment”. Removing the “(int)” before the ‘c’ makes the problem go away. It’s a strange way to write a return statement in general. However, ‘c’ is a global variable that is apparently part of some YACC/BISON-type output.

The second issue is when a case-switch block has a default label but with no code inside. Current gcc doesn’t like that. It’s necessary to at least provide a break statement after the default label.

Finally, the program turns out to not be 64-bit safe. It is necessary to compile it in 32-bit mode (compile and link with the ‘-m32′ flag or build on a 32-bit system). The static 32-bit binary should run fine under a 64-bit kernel.

Alternatively: What are some other Win32 disassemblers that work under Linux?

Posted in Reverse Engineering | 7 Comments »

VP8: The Savior Codec

April 16th, 2010 by Multimedia Mike

This past week, the internet picked up — and subsequently sprinted like a cheetah withan unsourced and highly unsubstantiated rumor that Google will open source the VP8 video codec, recently procured through their On2 acquisition. I wager that the FSF is already working on their press release claiming full credit should this actually come to pass. I still retain my “I’ll believe it when I see it” attitude. However, I thought this would be a good opportunity to consolidate all of the public knowledge regarding On2′s VP8 codec.



Pictured: All the proof you need that VP8 is superior to H.264
Update: The preceding comment is meant in sarcastic jest. Read on

The Official VP8 Facts:
Read the rest of this entry »

Posted in HTML5, Multimedia PressWatch, On2/Duck, VP8 | 22 Comments »

Book Review: Reversing: Secrets of Reverse Engineering

February 25th, 2010 by Multimedia Mike

I borrowed this book from a colleague since it covers half of the charter described at the top of this blog (“Topics on multimedia technology and reverse engineering”). It’s called Reversing: Secrets of Reverse Engineering by Eldad Eilam (Amazon also has a Kindle edition). Basically, if you have never reverse engineered anything from binary code before but are interested in coming up to speed rather quickly, drop the cash for this book and read it from cover to cover.


Book cover: Reversing: Secrets of Reverse Engineering

I’m feeling a bit sentimental this month since I distinctly recall it was 10 years ago, February 2000, that I developed this focus on multimedia. While I often explain that I just wanted to play QuickTime movie trailers on my Linux computer, here is when I got really interested: I had gone all-Linux, all the time at home by then. I downloaded a Real video file from the internet. I tried out Real’s Linux player. It was horrible. Forget about all the spyware/malware reputations of the Windows and Mac versions; this didn’t have any of that but couldn’t even keep basic A/V sync. Still looking to find my place in the world, deciding which niche I would try to fill, that’s when I wondered what it would really require to take apart such a file, decode the audio and video, and play them in sync. And that’s when I took up my hex editor and disassembler.

So multimedia was always the primary focus. RE was secondary; I didn’t really mean to learn so much about it but the study was necessary. Over the years, I have wanted to write down more of what I have learned and other ideas and experiments I have developed (one of my primary motivations for starting this blog, in fact).

How this all connects to the book is: This is the book I would have liked to write about RE. Frankly, the book didn’t really teach me anything new. It was a compendium of everything I’ve read, learned, and independently discovered over the past 10 years regarding RE. And that’s exactly why I think it’s such a valuable book. I’ve encountered no shortage of people who wish to learn these darks arts of binary RE. This book is a great starting point. It’s the book I wish I had started with 10 years ago (I see that it was first published 5 years ago, which still was too late for me).

One shortcoming I did observe during my skimming of the more than 500 pages is that the RE targets are mostly things like cryptographic algorithms, malware, copy protection, and DRM. My focus has always been to reverse engineer some rather large and tedious multimedia decompression algorithms. It’s a different domain with some different problems and assumptions.

Posted in Reverse Engineering | 2 Comments »

On Open Sourcing On2

February 22nd, 2010 by Multimedia Mike

I have been reading way too many statements from people who confidently assert that Google will open source all of On2′s IP based on no more evidence than… the fact that they really, really hope it happens. Meanwhile, I have found myself pettily hoping it doesn’t happen simply due to the knowledge that the FSF will claim total credit for such a development (don’t believe me? They already claim credit for Apple dropping DRM from music purchases: “Our Defective by Design campaign has a successful history of targeting Apple over its DRM policies… and under the pressure Steve Jobs dropped DRM on music.”)

But for the sake of discussion, let’s run with the idea: Let’s assume that Google open sources any of On2′s intellectual property. Be advised that if you’re the type who believes that all engineering problems large and small can be solved by applying, not thought, but a mystical, nebulous force called “open source”, you can go ahead and skip this post.

The Stack

Read the rest of this entry »

Posted in On2/Duck, Open Source Multimedia | 5 Comments »

On2 Acquisition

February 19th, 2010 by Multimedia Mike

I’ve been hearing it ever since last August:

Google owns On2. They are going to open source all of On2′s codecs.
Read the rest of this entry »

Posted in On2/Duck, VP3/Theora | 9 Comments »

« Previous Entries