Monthly Archives: March 2012

G.I. Joe Custom Multimedia

I received this 3-disc set of G.I. Joe CD-ROMs today:

So, it turns out that there are some games on these discs, done in Flash Player (which tells me that these were probably available on the web at some point). Here’s a shooting gallery game from the first disc:

As promised by the CD-ROM copy, the menu does grant access to 2 classic G.I. Joe episodes. Selecting either one launches this:

Powered by C-ezy? Am I interpreting that correctly? Anyway, the video player goes fullscreen and looks fine (given the source material). I can’t capture screenshots and controls are limited to: space for pause, ESC to exit player, and up/down to control volume. No seeking and certainly no onscreen controls. Pretty awful player.

Studying the first disc, I find a 550 MB file with the name 5859Hasbro.egm. Coupled with ep58.cfg and ep59.cfg files in the same directory, I gather that the disc has G.I. Joe episodes 58 and 59 (though the exact episodes, “There’s No Place Like Springfield” parts 1 and 2, are listed on Wikipedia as being episodes 154 and 155; but who’s counting?). The cfg files contain this text:

ep58.cfg:
EGM_GIJOE.exe
5859Hasbro.egm /noend /track:0 /singletrack 

ep59.cfg:
EGM_GIJOE.exe
5859Hasbro.egm /noend /track:1 /singletrack

The big EGM file starts with the string “Egenie Player”. After that, I see absolutely no clues. The supporting EGM_GIJOE.exe file has some interesting strings: “Decore Bits Per Pixel” (I know I have seen “Decore” used to mean “decoding core” in some libraries), “Egenie Player – %s, Version:%s”, “4th June 2002”, a list of common FourCC tags seen in AVI files, “Brought to you by Martin, Patrick Bob and Bren” (do you suppose “Patrick Bob” is one person’s name?), a list of command line options…

Aha! A URL: http:\\www.e-genie.tv (yep, backslashes, not forward slashes). e-genie.tv seems to redirect to mygenie.tv, which… doesn’t appear to be strictly related to video technology these days.

ANSI Code Coverage Followup

The people behind sixteencolors.net noticed my code coverage project concerning the ANSI video decoder and asked what they could do to help. I had already downloaded 350 / 4000 of their artpacks but didn’t want to download the remainder if I could avoid it. They offered to run my tool against their local collection of files.

Aside: They have all of the artpacks archived at Github.

The full corpus of nearly 4000 artpacks contains over 146,000 files. Versus my sampling of 350 artpacks and 13,000 files that covered all but 45 lines of the ansi.c source file, the full corpus has files to exercise… 6 more of those lines. Whee. This means that there are files which exercise the reverse and concealed attributes, all 3 “erase in line” modes, and one more error path (which probably wasn’t a valid file anyway).

Missing features mostly cluster around different video modes, including: 320×200 (25 rows), 640×200 (25 rows), 640×350 (43 rows), and 640×480 (60 rows); on the plus side, nothing tripped the “unsupported screen mode” case. There are no files that switch modes during playback.

I guess statistical sampling theory holds out here– a small set of randomly chosen files would do a fine job covering code. But this experiment is about finding the statistical outliers.

Finding Optimal Code Coverage

A few months ago, I published a procedure for analyzing code coverage of the test suites exercised in FFmpeg and Libav. I used it to add some more tests and I have it on good authority that it has helped other developers fill in some gaps as well (beginning with students helping out with the projects as part of the Google Code-In program). Now I’m wondering about ways to do better.

Current Process
When adding a test that depends on a sample (like a demuxer or decoder test), it’s ideal to add a sample that’s A) small, and B) exercises as much of the codebase as possible. When I was studying code coverage statistics for the WC4-Xan video decoder, I noticed that the sample didn’t exercise one of the 2 possible frame types. So I scouted samples until I found one that covered both types, trimmed the sample down, and updated the coverage suite.

I started wondering about a method for finding the optimal test sample for a given piece of code, one that exercises every code path in a module. Okay, so that’s foolhardy in the vast majority of cases (although I was able to add one test spec that pushed a module’s code coverage from 0% all the way to 100% — but the module in question only had 2 exercisable lines). Still, given a large enough corpus of samples, how can I find the smallest set of samples that exercise the complete codebase?

This almost sounds like an NP-complete problem. But why should that stop me from trying to find a solution?

Science Project
Here’s the pitch:

Instrument FFmpeg with code coverage support
Download lots of media to exercise a particular module
Run FFmpeg against each sample and log code coverage statistics
Distill the resulting data in some meaningful way in order to obtain more optimal code coverage

That first step sounds harsh– downloading lots and lots of media. Fortunately, there is at least one multimedia format in the projects that tends to be extremely small: ANSI. These are files that are designed to display elaborate scrolling graphics using text mode. Further, the FATE sample currently deployed for this test (TRE_IOM5.ANS) only exercises a little less than 50% of the code in libavcodec/ansi.c. I believe this makes the ANSI video decoder a good candidate for this experiment.

Procedure
First, find a site that hosts a lot ANSI files. Hi, sixteencolors.net. This site has lots (on the order of 4000) artpacks, which are ZIP archives that contain multiple ANSI files (and sometimes some other files). I scraped a list of all the artpack names.

In an effort to be responsible, I randomized the list of artpacks and downloaded periodically and with limited bandwidth ('wget --limit-rate=20k').

Run ‘gcov’ on ansi.c in order to gather the full set of line numbers to be covered.

For each artpack, unpack the contents, run the instrumented FFmpeg on each file inside, run ‘gcov’ on ansi.c, and log statistics including the file’s size, the file’s location (artpack.zip:filename), and a comma-separated list of line numbers touched.

Definition of ‘Optimal’
The foregoing procedure worked and yielded useful, raw data. Now I have to figure out how to analyze it.

I think it’s most desirable to have the smallest files (in terms of bytes) that exercise the most lines of code. To that end, I sorted the results by filesize, ascending. A Python script initializes a set of all exercisable line numbers in ansi.c, then iterates through each each file’s stats line, adding the file to the list of candidate samples if its set of exercised lines can remove any line numbers from the overall set of lines. Ideally, that set of lines should devolve to an empty set.

I think a second possible approach is to find the single sample that exercises the most code and then proceed with the previously described method.

Initial Results
So far, I have analyzed 13324 samples from 357 different artpacks provided by sixteencolors.net. Continue reading →

WMA Lossless and ProRes Encoder

The projects (FFmpeg / Libav) just got a WMA lossless decoder. For those keeping score, this means that there are open source methods for decoding every single one of Microsoft’s proprietary audio codecs (Windows Media Audio, or WMA): WMA v1, WMA v2, WMA9/Pro, WMA Voice, and now WMA lossless. Currently, it’s only advertised to decode 16-bit audio (no 24-bit). Also, when I first tried it a few days ago, it didn’t decode the very end of the single sample file I concocted many years ago (luckynight.wma). But that might be cleared up by now.

Some other recent developments in the projects that I wanted to call out: An encoder for the Apple ProRes encoder from Kostya; XWD (X window dump) image decoding and encoding from Paul B. Mahol; a Sun rasterfile encoder from Aneesh Dogra.

And then there’s the new playback system for CDXL files, also courtesy of Paul B. Mahol. I wasn’t familiar with this format until I wrote this post, which is surprising, given the format’s vintage. This was a CD-ROM FMV format favored for Amiga computers. Here it is in all its 160x120x10fps glory:

An Amiga CDXL file playing in FFmpeg/Libav

That’s the amigaball.cdxl sample available in the repository. The sample is 3835910 bytes large and plays for about 24 seconds. This yields a data rate of about 159 kbytes/second. So, yeah, single-speed CD-ROM FMV.

Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering

Monthly Archives: March 2012

G.I. Joe Custom Multimedia

ANSI Code Coverage Followup

Finding Optimal Code Coverage

WMA Lossless and ProRes Encoder