Author Archives: Multimedia Mike

ANSI Code Coverage Followup

The people behind sixteencolors.net noticed my code coverage project concerning the ANSI video decoder and asked what they could do to help. I had already downloaded 350 / 4000 of their artpacks but didn’t want to download the remainder if I could avoid it. They offered to run my tool against their local collection of files.

Aside: They have all of the artpacks archived at Github.

The full corpus of nearly 4000 artpacks contains over 146,000 files. Versus my sampling of 350 artpacks and 13,000 files that covered all but 45 lines of the ansi.c source file, the full corpus has files to exercise… 6 more of those lines. Whee. This means that there are files which exercise the reverse and concealed attributes, all 3 “erase in line” modes, and one more error path (which probably wasn’t a valid file anyway).

Missing features mostly cluster around different video modes, including: 320×200 (25 rows), 640×200 (25 rows), 640×350 (43 rows), and 640×480 (60 rows); on the plus side, nothing tripped the “unsupported screen mode” case. There are no files that switch modes during playback.

I guess statistical sampling theory holds out here– a small set of randomly chosen files would do a fine job covering code. But this experiment is about finding the statistical outliers.

Finding Optimal Code Coverage

A few months ago, I published a procedure for analyzing code coverage of the test suites exercised in FFmpeg and Libav. I used it to add some more tests and I have it on good authority that it has helped other developers fill in some gaps as well (beginning with students helping out with the projects as part of the Google Code-In program). Now I’m wondering about ways to do better.

Current Process
When adding a test that depends on a sample (like a demuxer or decoder test), it’s ideal to add a sample that’s A) small, and B) exercises as much of the codebase as possible. When I was studying code coverage statistics for the WC4-Xan video decoder, I noticed that the sample didn’t exercise one of the 2 possible frame types. So I scouted samples until I found one that covered both types, trimmed the sample down, and updated the coverage suite.

I started wondering about a method for finding the optimal test sample for a given piece of code, one that exercises every code path in a module. Okay, so that’s foolhardy in the vast majority of cases (although I was able to add one test spec that pushed a module’s code coverage from 0% all the way to 100% — but the module in question only had 2 exercisable lines). Still, given a large enough corpus of samples, how can I find the smallest set of samples that exercise the complete codebase?

This almost sounds like an NP-complete problem. But why should that stop me from trying to find a solution?

Science Project
Here’s the pitch:

  • Instrument FFmpeg with code coverage support
  • Download lots of media to exercise a particular module
  • Run FFmpeg against each sample and log code coverage statistics
  • Distill the resulting data in some meaningful way in order to obtain more optimal code coverage

That first step sounds harsh– downloading lots and lots of media. Fortunately, there is at least one multimedia format in the projects that tends to be extremely small: ANSI. These are files that are designed to display elaborate scrolling graphics using text mode. Further, the FATE sample currently deployed for this test (TRE_IOM5.ANS) only exercises a little less than 50% of the code in libavcodec/ansi.c. I believe this makes the ANSI video decoder a good candidate for this experiment.

Procedure
First, find a site that hosts a lot ANSI files. Hi, sixteencolors.net. This site has lots (on the order of 4000) artpacks, which are ZIP archives that contain multiple ANSI files (and sometimes some other files). I scraped a list of all the artpack names.

In an effort to be responsible, I randomized the list of artpacks and downloaded periodically and with limited bandwidth ('wget --limit-rate=20k').

Run ‘gcov’ on ansi.c in order to gather the full set of line numbers to be covered.

For each artpack, unpack the contents, run the instrumented FFmpeg on each file inside, run ‘gcov’ on ansi.c, and log statistics including the file’s size, the file’s location (artpack.zip:filename), and a comma-separated list of line numbers touched.

Definition of ‘Optimal’
The foregoing procedure worked and yielded useful, raw data. Now I have to figure out how to analyze it.

I think it’s most desirable to have the smallest files (in terms of bytes) that exercise the most lines of code. To that end, I sorted the results by filesize, ascending. A Python script initializes a set of all exercisable line numbers in ansi.c, then iterates through each each file’s stats line, adding the file to the list of candidate samples if its set of exercised lines can remove any line numbers from the overall set of lines. Ideally, that set of lines should devolve to an empty set.

I think a second possible approach is to find the single sample that exercises the most code and then proceed with the previously described method.

Initial Results
So far, I have analyzed 13324 samples from 357 different artpacks provided by sixteencolors.net. Continue reading

WMA Lossless and ProRes Encoder

The projects (FFmpeg / Libav) just got a WMA lossless decoder. For those keeping score, this means that there are open source methods for decoding every single one of Microsoft’s proprietary audio codecs (Windows Media Audio, or WMA): WMA v1, WMA v2, WMA9/Pro, WMA Voice, and now WMA lossless. Currently, it’s only advertised to decode 16-bit audio (no 24-bit). Also, when I first tried it a few days ago, it didn’t decode the very end of the single sample file I concocted many years ago (luckynight.wma). But that might be cleared up by now.

Some other recent developments in the projects that I wanted to call out: An encoder for the Apple ProRes encoder from Kostya; XWD (X window dump) image decoding and encoding from Paul B. Mahol; a Sun rasterfile encoder from Aneesh Dogra.

And then there’s the new playback system for CDXL files, also courtesy of Paul B. Mahol. I wasn’t familiar with this format until I wrote this post, which is surprising, given the format’s vintage. This was a CD-ROM FMV format favored for Amiga computers. Here it is in all its 160x120x10fps glory:



That’s the amigaball.cdxl sample available in the repository. The sample is 3835910 bytes large and plays for about 24 seconds. This yields a data rate of about 159 kbytes/second. So, yeah, single-speed CD-ROM FMV.

Pushing Projects to Github

I finally got around to importing some old projects into my Github account. I guess it’s good to have a backup out there in the cloud.

GhettoRSS
https://github.com/multimediamike/GhettoRSS
I describe this as a true offline RSS reader. Technically, it’s arguably not a true offline RSS reader. Rather, it does what most people actually want an offline RSS reader to do.

I wrote this about 2 years ago when I had a long daily train ride with a disconnected netbook. I quickly learned that I couldn’t count on offline RSS readers simply because most RSS feeds to not contain much meat. Thus, I created a program that follows URLs in RSS feeds, downloads web pages and supporting images and CSS files, and caches them in an offline database which can be read via a local web browser.

I wrote more information about this little project 2 years ago (here is part 1 and here is part 2). I fixed a few bugs in preparation for posting it but I probably won’t work on this anymore since I don’t have any use for it (the commute is long gone, but I didn’t even use it when I was commuting because I decided I just didn’t care enough to read the feeds on the train).

xbfuse
https://github.com/multimediamike/xbfuse
This is a FUSE module for mounting Xbox/360 optical disc filesystems. Here is when I first discussed it. The tool has had its own little homepage for a long time. This tool has seen some development, as I learned from Googling for “xbfuse”. Regrettably, no one who has modified the tool has ever contacted me about it (at least, not that I can recall). This is unfortunate because the patches I have seen floating around which fix my xbfuse for various installations usually boil down replacing many occurrences of an include path in the autotool-generated build system. There is probably a simpler, cleaner fix.

gcfuse
https://github.com/multimediamike/gcfuse
Written prior to xbfuse, this is a FUSE module for mounting GameCube optical disc filesystems. I first discussed this here and here. This tool has not seen too much direct development although someone eventually used it as the basis for WiiFuse which, as you can predict, mounts optical disc filesystems from Nintendo Wii games.