November, 2011 | Breaking Eggs And Making Omelettes

A little while ago, I got a big head over the fact that I owned and controlled the feared and revered MPlayer samples archive. This is the repository that retains more than a decade of multimedia samples.

Conflict
Where once there was one multimedia project (FFmpeg), there are now 2 (also Libav). There were various political and technical snafus regarding the previous infrastructure. I volunteered to take over hosting the vast samples archive (53 GB at the time) at samples.mplayerhq.hu (s.mphq for this post).

However, a brand new server is online at samples.libav.org (s.libav for this post).

Policies
The server at s.libav will be the authoritative samples repository going forward. Why does s.libav receive the honor? Mostly by virtue of having more advanced features. My simple (yet bandwidth-rich) web hosting plan does not provide for rsync or anonymous FTP services, both of which have traditionally been essential for the samples server. In the course of hosting s.mphq for the past few months, a few more discrepancies have come to light– apparently, the symlinks weren’t properly replicated. And perhaps most unusual is that if a directory contains a README file, it won’t be displayed in the directory listing (which frustrated me greatly when I couldn’t find this README file that I carefully and lovingly crafted years ago).

The s.mphq archive will continue to exist — nay, must exist — going forward since there are years’ worth of web links pointing into it. I’ll likely set up a mirroring script that periodically (daily) rsyncs from s.libav to my local machine and then uses lftp (the best facility I have available) to mirror the files up to s.mphq.

Also, since we’re starting fresh with a new upload directory, I think we need to be far more ruthless about policing its content. This means making sure that anything that is uploaded has an accompanying file which explains why it’s there and ideally links the sample to a bug report somewhere. No explanation = sample terminated.

RSS
I think it would be nifty to have an RSS feed that shows the latest samples to appear in the repository. I figure that I can use the Unix ‘find’ command on my local repository in concert with something like PyRSS2Gen to accomplish this goal.

Monetization
In the few months that I have been managing the repository, I have had numerous requests for permission to leech the entire collection in one recursive web-suck. These requests often from commercial organizations who wish to test their multimedia product on a large corpus of diverse samples. Personally, I believe the archive makes a rather poor corpus for such an endeavor, but so be it. Go ahead; hosting this archive barely makes a dent in my fairly low-end web hosting plan. However, at least one person indicated that it might be easier to mail a hard drive to me, have me copy it, and send it back.

This got me thinking about monetization opportunities. Perhaps, I should provide a service to send HDs filled with samples for the cost of the HD, shipping, and a small donation to the multimedia projects. I immediately realized that that is precisely the point at which the vast multimedia samples archive — with all of its media of questionable fair use status — would officially run afoul of copyright laws.

Which brings me to…

Clean Up
I think we need to clean up some samples, starting with the ones that were marked not-readable in the old repository. Apparently, some ‘samples’ were, e.g., full anime videos and were responsible for a large bandwidth burden when linked from various sources.

We multimedia nerds are a hoarding lot, never willing to throw anything away. This will probably the most challenging proposal to implement.

Last year, I delved into code coverage tools and their usage with FFmpeg. I learned about using GNU gcov, which is powerful but pretty raw about the details it provides to you. I wrote a script to help interpret its output and later found another script called gcovr to do the same, only much better.

I later found another tool called lcov which is absolutely amazing for understanding code coverage of your software. I’ve been meaning to use it to further FATE test coverage for the multimedia projects.

Click for larger image

Basic Instructions
Install the lcov tool, of course. In Ubuntu, 'apt-get install lcov' will do the trick.

Build the project with code coverage support, i.e.,

./configure --enable-gpl --samples=/path/to/fate/samples \
 --extra-cflags="-fprofile-arcs -ftest-coverage" \
 --extra-ldflags="-fprofile-arcs -ftest-coverage"
make

Clear the coverage data:

lcov --directory . --zerocounters

Run the software (in this case, the FATE test suite):

make fate

Let lcov work its magic:

lcov --directory . --capture --output-file coverage.info
mkdir html-output
genhtml -o html-output coverage.info

At this point, you can aim your web browser at html-output/index.html to learn everything you could possibly want to know about code coverage of the test suite. You can sort various columns in order to see which modules have the least code coverage. You can drill into individual source files and see highlighted markup demonstrating which lines have been executed.

As you can see from the screenshot above, FFmpeg / Libav are not anywhere close to full coverage. But lcov provides an exquisite roadmap.

Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering

Monthly Archives: November 2011

The New Samples Regime

Using lcov With FFmpeg/Libav