A little while ago, I got a big head over the fact that I owned and controlled the feared and revered MPlayer samples archive. This is the repository that retains more than a decade of multimedia samples.
Where once there was one multimedia project (FFmpeg), there are now 2 (also Libav). There were various political and technical snafus regarding the previous infrastructure. I volunteered to take over hosting the vast samples archive (53 GB at the time) at samples.mplayerhq.hu (s.mphq for this post).
However, a brand new server is online at samples.libav.org (s.libav for this post).
The server at s.libav will be the authoritative samples repository going forward. Why does s.libav receive the honor? Mostly by virtue of having more advanced features. My simple (yet bandwidth-rich) web hosting plan does not provide for rsync or anonymous FTP services, both of which have traditionally been essential for the samples server. In the course of hosting s.mphq for the past few months, a few more discrepancies have come to light– apparently, the symlinks weren’t properly replicated. And perhaps most unusual is that if a directory contains a
README file, it won’t be displayed in the directory listing (which frustrated me greatly when I couldn’t find this README file that I carefully and lovingly crafted years ago).
The s.mphq archive will continue to exist — nay, must exist — going forward since there are years’ worth of web links pointing into it. I’ll likely set up a mirroring script that periodically (daily) rsyncs from s.libav to my local machine and then uses lftp (the best facility I have available) to mirror the files up to s.mphq.
Also, since we’re starting fresh with a new upload directory, I think we need to be far more ruthless about policing its content. This means making sure that anything that is uploaded has an accompanying file which explains why it’s there and ideally links the sample to a bug report somewhere. No explanation = sample terminated.
I think it would be nifty to have an RSS feed that shows the latest samples to appear in the repository. I figure that I can use the Unix ‘find’ command on my local repository in concert with something like PyRSS2Gen to accomplish this goal.
In the few months that I have been managing the repository, I have had numerous requests for permission to leech the entire collection in one recursive web-suck. These requests often from commercial organizations who wish to test their multimedia product on a large corpus of diverse samples. Personally, I believe the archive makes a rather poor corpus for such an endeavor, but so be it. Go ahead; hosting this archive barely makes a dent in my fairly low-end web hosting plan. However, at least one person indicated that it might be easier to mail a hard drive to me, have me copy it, and send it back.
This got me thinking about monetization opportunities. Perhaps, I should provide a service to send HDs filled with samples for the cost of the HD, shipping, and a small donation to the multimedia projects. I immediately realized that that is precisely the point at which the vast multimedia samples archive — with all of its media of questionable fair use status — would officially run afoul of copyright laws.
Which brings me to…
I think we need to clean up some samples, starting with the ones that were marked not-readable in the old repository. Apparently, some ‘samples’ were, e.g., full anime videos and were responsible for a large bandwidth burden when linked from various sources.
We multimedia nerds are a hoarding lot, never willing to throw anything away. This will probably the most challenging proposal to implement.