Breaking Eggs And Making Omelettes

Topics On Multimedia Technology and Reverse Engineering


Archives:

Meta:

Salty Game Music

May 31st, 2011 by Multimedia Mike

Have you heard of Google’s Native Client (NaCl) project? Probably not. Basically, it allows native code modules to run inside a browser (where ‘browser’ is defined pretty narrowly as ‘Google Chrome’ in this case). Programs are sandboxed so they aren’t a security menace (or so the whitepapers claim) but are allowed to access a variety of APIs including video and audio. The latter API is significant because sound tends to be forgotten in all the hullabaloo surrounding non-Flash web technologies. At any rate, enjoy NaCl while you can because I suspect it won’t be around much longer.

After my recent work upgrading some old music synthesis programs to use more modern audio APIs, I got the idea to try porting the same code to run under NaCl in Chrome (first Nosefart, then Game Music Emu/GME). In this exercise, I met with very limited success. This blog post documents some of the pitfalls in my excursion.



Infrastructure
People who know me know that I’m rather partial — to put it gently — to straight-up C vs. C++. The NaCl SDK is heavily skewed towards C++. However, it does provide a Python tool called init_project.py which can create the skeleton of a project and can do so in C with the '-c' option:

./init_project.py -c -n saltynosefart

This generates something that can be built using a simple ‘make’. When I added Nosefart’s C files, I learned that the project Makefile has places for project-necessary CFLAGS but does not honor them. The problem is that the generated Makefile includes a broader system Makefile that overrides the CFLAGS in the project Makefile. Going into the system Makefile and changing "CFLAGS =" -> "CFLAGS +=" solves this problem.

Still, maybe I’m the first person to attempt building something in Native Client so I’m the first person to notice this?

Basic Playback
At least the process to create an audio-enabled NaCl app is well-documented. Too bad it doesn’t seem to compile as advertised. According to my notes on the matter, I filled in PPP_InitializeModule() with the appropriate boilerplate as outlined in the docs but got a linker error concerning get_browser_interface().

Plan B: C++
Obviously, the straight C stuff is very much a second-class citizen in this NaCl setup. Fortunately, there is already that fully functional tone generator example program in the limited samples suite. Plan B is to copy that project and edit it until it accepts Nosefart/GME audio instead of a sine wave.

The build system assumes all C++ files should have .cc extensions. I have to make some fixes so that it will accept .cpp files (either that, or rename all .cpp to .cc, but that’s not very clean).

Making Noise
You’ll be happy to know that I did successfully swap out the tone generator for either Nosefart or GME. Nosefart has a slightly fickle API that requires revving the emulator frame by frame and generating a certain number of audio samples. GME’s API is much easier to work with in this situation — just tell it how many samples it needs to generate and give it a pointer to a buffer. I played NES and SNES music play through this ad-hoc browser plugin, and I’m confident all the other supported formats would have worked if I went through the bother of converting the music data files into C headers to be included in the NaCl executable binaries (dynamically loading data via the network promised to be a far more challenging prospect reserved for phase 3 of the project).

Portable?
I wouldn’t say so. I developed it on Linux and things ran fine there. I tried to run the same binaries on the Windows version of Chrome to no avail. It looks like it wasn’t even loading the .nexe files (NaCl executables).

Thinking About The (Lack Of A) Future
As I was working on this project, I noticed that the online NaCl documentation materialized explicit banners warning that my NaCl binaries compiled for Chrome 11 won’t work for Chrome 12 and that I need to code to the newly-released 0.3 SDK version. Not a fuzzy feeling. I also don’t feel good that I’m working from examples using bleeding edge APIs that feature deprecation as part of their naming convention, e.g., pp::deprecated::ScriptableObject().

Ever-changing API + minimal API documentation + API that only works in one browser brand + requiring end user to explicitly enable feature = … well, that’s why I didn’t bother to release any showcase pertaining to this little experiment. Would have been neat, but I strongly suspect that this is yet another one of those APIs that Google decides to deprecate soon.

See Also:

Posted in General | 3 Comments »

Revisiting Nosefart and Discovering GME

May 29th, 2011 by Multimedia Mike

I found the following screenshot buried deep in an old directory structure of mine:



I tried to recall how this screenshot came to exist. Had I actually created a functional KDE frontend to Nosefart yet neglected to release it? I think it’s more likely that I used some designer tool (possibly KDevelop) to prototype a frontend. This would have been sometime in 2000.

However, this screenshot prompted me to revisit Nosefart.

Nosefart Background
Nosefart is a program that can play Nintendo Sound Format (NSF) files. NSF files are files containing components that were surgically separated from Nintendo Entertainment System (NES) ROM dumps. These components contain the music playback engines for various games. An NSF player is a stripped down emulation system that can simulate the NES6502 CPU along with the custom hardware (2 square waves, 1 triangle wave, 1 noise generator, and 1 limited digital channel).

Nosefart was written by Matt Conte and eventually imported into a Sourceforge project, though it has not seen any development since then. The distribution contains standalone command line players for Linux and DOS, a GTK frontend for the Linux command line version, and plugins for Winamp, XMMS, and CL-Amp.

The Sourceforge project page notes that Nosefart is also part of XBMC. Let the record show that Nosefart is also incorporated into xine (I did that in 2002, I think).

Upgrading the API
When I tried running the command line version of Nosefart under Linux, I hit hard against the legacy audio API: OSS. Remember that?

In fairly short order, I was able to upgrade the CL program to use PulseAudio. The program is not especially sophisticated. It’s a single-threaded affair which checks for a keypress, processes an audio frame, and sends the frame out to the OSS file interface. All that was needed was to rewrite open_hardware() and close_hardware() for PA and then replace the write statement in play(). The only quirk that stood out is that including <pulse/pulseaudio.h> is insufficient for programming PA’s simple API. <pulse/simple.h> must be included separately.

For extra credit, I adapted the program to ALSA. The program uses the most simplistic audio output API possible — just keep filling a buffer and sending it out to the DAC.

Discovering GME
I’m not sure what to do with the the program now since, during my research to attempt to bring Nosefart up to date, I became aware of a software library named Game Music Emu, or GME. It’s a pure C++ library that can essentially play any classic video game format you can possible name. Wow. A lot can happen in 10 years when you’re not paying attention.

It’s such a well-written library that I didn’t need any tutorial or documentation to come up to speed. Just a quick read of the main gme.h header library enabled me in short order to whip up a quick C program that could play NSF and SPC files. Path of least resistance: Client program asks library to open a hardcoded file, synthesize 10 seconds of audio, and dump it into a file; ask the FLAC command line program to transcode raw data to .flac file; use ffplay to verify the results.

I might develop some other uses for this library.

Posted in Game Hacking | 5 Comments »

Method For Crawling Google

May 27th, 2011 by Multimedia Mike

I wanted to crawl Google in order to harvest a large corpus of certain types of data as yielded by a certain search term (we’ll call it “term” for this exercise). Google doesn’t appear to offer any API to automatically harvest their search results (why would they?). So I sat down and thought about how to do it. This is the solution I came up with.



FAQ
Q: Is this legal / ethical / compliant with Google’s terms of service?
A: Does it look like I care? Moving right along…

Manual Crawling Process
For this exercise, I essentially automated the task that would be performed by a human. It goes something like this:

  1. Search for “term”
  2. On the first page of results, download each of the 10 results returned
  3. Click on the next page of results
  4. Go to step 2, until Google doesn’t return anymore pages of search results

Google returns up to 1000 results for a given search term. Fetching them 10 at a time is less than efficient. Fortunately, the search URL can easily be tweaked to return up to 100 results per page.

Expanding Reach
Problem: 1000 results for the “term” search isn’t that many. I need a way to expand the search. I’m not aiming for relevancy; I’m just searching for random examples of some data that occurs around the internet.

My solution for this is to refine the search using the “site” wildcard. For example, you can ask Google to search for “term” at all Canadian domains using “site:.ca”. So, the manual process now involves harvesting up to 1000 results for every single internet top level domain (TLD). But many TLDs can be more granular than that. For example, there are 50 sub-domains under .us, one for each state (e.g., .ca.us, .ny.us). Those all need to be searched independently. Same for all the sub-domains under TLDs which don’t allow domains under the main TLD, such as .uk (search under .co.uk, .ac.uk, etc.).

Another extension is to combine “term” searches with other terms that are likely to have a rich correlation with “term”. For example, if “term” is relevant to various scientific fields, search for “term” in conjunction with various scientific disciplines.

Algorithmically
My solution is to create an SQLite database that contains a table of search seeds. Each seed is essentially a “site:” string combined with a starting index.

Each TLD and sub-TLD is inserted as a searchseed record with a starting index of 0.

A script performs the following crawling algorithm:

  • Fetch the next record from the searchseed table which has not been crawled
  • Fetch search result page from Google
  • Scrape URLs from page and insert each into URL table
  • Mark the searchseed record as having been crawled
  • If the results page indicates there are more results for this search, insert a new searchseed for the same seed but with a starting index 100 higher

Digging Into Sites
Sometimes, Google notes that certain sites are particularly rich sources of “term” and offers to let you search that site for “term”. This basically links to another search for ‘term site:somesite”. That site gets its own search seed and the program might harvest up to 1000 URLs from that site alone.

Harvesting the Data
Armed with a database of URLs, employ the following algorithm:

  • Fetch a random URL from the database which has yet to be downloaded
  • Try to download it
  • For goodness sake, have a mechanism in place to detect whether the download process has stalled and automatically kill it after a certain period of time
  • Store the data and update the database, noting where the information was stored and that it is already downloaded

This step is easy to parallelize by simply executing multiple copies of the script. It is useful to update the URL table to indicate that one process is already trying to download a URL so multiple processes don’t duplicate work.

Acting Human
A few factors here:

  • Google allegedly doesn’t like automated programs crawling its search results. Thus, at the very least, don’t let your script advertise itself as an automated program. At a basic level, this means forging the User-Agent: HTTP header. By default, Python’s urllib2 will identify itself as a programming language. Change this to a well-known browser string.
  • Be patient; don’t fire off these search requests as quickly as possible. My crawling algorithm inserts a random delay of a few seconds in between each request. This can still yield hundreds of useful URLs per minute.
  • On harvesting the data: Even though you can parallelize this and download data as quickly as your connection can handle, it’s a good idea to randomize the URLs. If you hypothetically had 4 download processes running at once and they got to a point in the URL table which had many URLs from a single site, the server might be configured to reject too many simultaneous requests from a single client.

Conclusion
Anyway, that’s just the way I would (and did) do it. What did I do with all the data? That’s a subject for a different post.

Adorable spider drawing from here.

Posted in Big Data | 6 Comments »

Who Invented FLIC?

May 25th, 2011 by Multimedia Mike

I have been reading through “All Your Base Are Belong To Us: How 50 Years of Video Games Conquered Pop Culture” by Harold Goldberg. Despite the title, Zero Wing has yet to be mentioned (I’m about halfway done).



I just made it through the chapter describing early breakthrough CD-ROM games, including Myst, The 7th Guest, and The 11th Hour. Some interesting tidbits:

The 7th Guest
Of course, Graeme Devine created a new FMV format (called VDX, documented here) for The 7th Guest. The player was apparently called PLAY and the book claims that Autodesk was so impressed by the technology that it licensed the player for use in its own products. When I think of an Autodesk multimedia format, I think of FLIC. The VDX coding format doesn’t look too much like FLIC, per my reading.

Here’s the relevant passage (pp 118-119):

Devine began working on creating software within the CD-ROM disk that would play full-motion video. Within days he had a robust but small ninety-kilobyte player called PLAY that was so good, it was licensed by Autodesk, the makers of the best 3-D animation program at the time. Then Devine figured out a way to compress the huge video files so that they would easily fit on two CD-ROMs.

Googling for “autodesk trilobyte play program” (Trilobyte was the company behind 7th Guest) led me to this readme file for a program called PLAY73 (hosted at Jason Scott’s massive CD-ROM archive, and it’s on a disc that, incidentally, I donated to the archive; so, let’s here it for Jason’s tireless archival efforts! And for Google’s remarkable indexing prowess). The file — dated September 10, 1991 — mentions that it’s a FLICK player, copyright Trilobyte software.



However, it also mentions being a Groovie Player. Based on ScummVM’s reimplementation of the VDX format, Groovie might refer to the engine behind The 7th Guest.

So now I’m really interested: Did Graeme Devine create the FLIC file format? Multimedia nerds want to know!

I guess not. Thanks to Jim Leonard for digging up this item: “I developed the flic file format for the Autodesk Animator.” Jim Kent, Dr. Dobbs Magazine, March 1993.

The PLAY73 changelog reveals something from the bad old days of DOS/PC programming: The necessity of writing graphics drivers for 1/2 dozen different video adapters. The PLAY73 readme file also has some vintage contact address for Graeme Devine; remember when addresses looked like these?

If you have any comments, please send them to:
	Compuserve: 72330,3276
	Genie     : G.DEVINE
	Internet  : 72330,3276@compuserve.com

The 11th Hour
The book didn’t really add anything I didn’t already know regarding the compression format (RoQ) used in 11th Hour. I already knew how hard Devine worked at it. This book took pains to emphasize the emotional toll taken on the format’s creator.

I wonder if he would be comforted to know that, more than 15 years later, people are still finding ways to use the format.

Posted in Multimedia History | 9 Comments »

Dreamcast Archival

May 23rd, 2011 by Multimedia Mike

Console homebrew communities have always had a precarious relationship with console pirates. The same knowledge and skills useful for creating homebrew programs can usually be parlayed into ripping games and cajoling a console into honoring ripped copies. For this reason, the Dreamcast homebrew community tried hard to distance itself from pirates, rippers, and other unsavory characters.


Lot of 9 volumes of the Official Sega Dreamcast Magazine

Funny how times change. While I toed the same line while I was marginally a part of the community back in the day, now I think I’m performing a service for video game archivists and historians by openly publishing the same information. I know of at least one solution already. But I think it’s possible to do much better.

Pre-existing Art
Famed Japanese game hacker BERO (FFmpeg contributors should recognize his name from a number of Dreamcast-related multimedia contributions including CRI ADX and SH-4 optimizations) crafted a program called dreamrip based on KOS’s precursor called libdream. This is the program I used to extract 4XM multimedia files from Alone in the Dark: The New Nightmare.

Fun facts: The Sega Dreamcast used special optical discs called GD-ROMs. The GD stands for ‘GigaDisc’ which implied that they could hold roughly a gigabyte of data. How long do you think it takes to transfer that much data over a serial cable operating at 115,200 bits/second (on the order of 11 Kbytes/sec)? I seem to recall entire discs requiring on the order of 27-28 hours to archive.

If only I possessed some expertise in data compression which might expedite this process.

KallistiOS’ Unwitting Help
The KallistiOS (KOS) console-oriented RTOS provides all the software infrastructure necessary for archiving (that’s what we’ll call it in this post) Dreamcast games. KOS exposes the optical disc’s filesystem via the /cd mount point on the VFS. From there, KOS provides functions for communicating with a host computer via ethernet (broadband adapter) or serial line (DC coder’s cable). To this end, KOS exposes another mount point on the VFS named /pc which allows direct access to the host PC’s filesystem.

Thus, it’s pretty straightforward to use KOS to access the files (or raw sectors) of the Dreamcast disc and then send them over the communication line to the host PC. Simple.

Compressing Before Transfer
Right away, I wonder about compiling 3 different compression libraries: libz, libbz2, and liblzma. The latter 2 are exceptionally CPU-intensive to compress. Then again, it doesn’t really matter how long the compressor takes to do its job as long as it can average better than 11 Kbytes/sec on a 200MHz Hitachi SH-4 CPU. KOS can be set up in a preemptive threading mode which means it should be possible to read sectors and compress them while keeping the UART operating at full tilt.

A 4th compression algorithm should be in play here as well: FLAC. Since some of these discs contain red book CD audio tracks that need archival, lossless audio compression should be useful.

This post serves as a rough overview for possible future experiments. Readers might have further brainstorms.

Posted in Sega Dreamcast | 13 Comments »

CD-R Read Speed Experiments

May 21st, 2011 by Multimedia Mike

I want to know how fast I can really read data from a CD-R. Pursuant to my previous musings on this subject, I was informed that it is inadequate to profile reading just any file from a CD-R since data might be read faster or slower depending on whether the data is closer to the inside or the outside of the disc.

Conclusion / Executive Summary
It is 100% true that reading data from the outside of a CD-R is faster than reading data from the inside. Read on if you care to know the details of how I arrived at this conclusion, and to find out just how much speed advantage there is to reading from the outside rather than the inside.

Science Project Outline

  • Create some sample CD-Rs with various properties
  • Get a variety of optical drives
  • Write a custom program that profiles the read speed

Creating The Test Media
It’s my understanding that not all CD-Rs are created equal. Fortunately, I have 3 spindles of media handy: Some plain-looking Memorex discs, some rather flamboyant Maxell discs, and those 80mm TDK discs:



My approach for burning is to create a single file to be burned into a standard ISO-9660 filesystem. The size of the file will be the advertised length of the CD-R minus 1 megabyte for overhead– so, 699 MB for the 120mm discs, 209 MB for the 80mm disc. The file will contain a repeating sequence of 0..0xFF bytes.

Profiling
I don’t want to leave this to the vagaries of any filesystem handling layer so I will conduct this experiment at the sector level. Profiling program outline:

  • Read the CD-ROM TOC and get the number of sectors that comprise the data track
  • Profile reading the first 20 MB of sectors
  • Profile reading 20 MB of sectors in the middle of the track
  • Profile reading the last 20 MB of sectors

Unfortunately, I couldn’t figure out the raw sector reading on modern Linux incarnations (which is annoying since I remember it being pretty straightforward years ago). So I left it to the filesystem after all. New algorithm:

  • Open the single, large file on the CD-R and query the file length
  • Profile reading the first 20 MB of data, 512 kbytes at a time
  • Profile reading 20 MB of sectors in the middle of the track (starting from filesize / 2 – 10 MB), 512 kbytes at a time
  • Profile reading the last 20 MB of sectors (starting from filesize – 20MB), 512 kbytes at a time

Empirical Data
I tested the program in Linux using an LG Slim external multi-drive (seen at the top of the pile in this post) and one of my Sega Dreamcast units. I gathered the median value of 3 runs for each area (inner, middle, and outer). I also conducted a buffer flush in between Linux runs (as root: 'sync; echo 3 > /proc/sys/vm/drop_caches').

LG Slim external multi-drive (reading from inner, middle, and outer areas in kbytes/sec):

  • TDK-80mm: 721, 897, 1048
  • Memorex-120mm: 1601, 2805, 3623
  • Maxell-120mm: 1660, 2806, 3624

So the 120mm discs can range from about 10.5X all the way up to a full 24X on this drive. For whatever reason, the 80mm disc fares a bit worse — even at the inner track — with a range of 4.8X – 7X.

Sega Dreamcast (reading from inner, middle, and outer areas in kbytes/sec):

  • TDK-80mm: 502, 632, 749
  • Memorex-120mm: 499, 889, 1143
  • Maxell-120mm: 500, 890, 1156

It’s interesting that the 80mm disc performed comparably to the 120mm discs in the Dreamcast, in contrast to the LG Slim drive. Also, the results are consistent with my previous profiling experiments, which largely only touched the inner area. The read speeds range from 3.3X – 7.7X. The middle of a 120mm disc reads at about 6X.

Implications
A few thoughts regarding these results:

  • Since the very definition of 1X is the minimum speed necessary to stream data from an audio CD, then presumably, original 1X CD-ROM drives would have needed to be capable of reading 1X from the inner area. I wonder what the max read speed at the outer edges was? It’s unlikely I would be able to get a 1X drive working easily in this day and age since the earliest CD-ROM drives required custom controllers.
  • I think 24X is the max rated read speed for CD-Rs, at least for this drive. This implies that the marketing literature only cites the best possible numbers. I guess this is no surprise, similar to how monitors and TVs have always been measured by their diagonal dimension.
  • Given this data, how do you engineer an ISO-9660 filesystem image so that the timing-sensitive multimedia files live on the outermost track? In the Dreamcast case, if you can guarantee your FMV files will live somewhere between the middle and the end of the disc, you should be able to count on a bitrate of at least 900 kbytes/sec.

Source Code
Here is the program I wrote for profiling. Note that the filename is hardcoded (#define FILENAME). Compiling for Linux is a simple 'gcc -Wall profile-cdr.c -o profile-cdr'. Compiling for Dreamcast is performed in the standard KallistiOS manner (people skilled in the art already know what they need to know); the only variation is to compile with the '-D_arch_dreamcast' flag, which the default KOS environment adds anyway.

Read the rest of this entry »

Posted in Science Projects, Sega Dreamcast | 10 Comments »

Programming Language Levels

May 19th, 2011 by Multimedia Mike

I’ve been doing this programming thing for some 20 years now. Things sure do change. One change I ponder from time to time is the matter of programming language levels. Allow me to explain.

The 1990s
When I first took computer classes in the early 1990s, my texts would classify computer languages into 3 categories, or levels. The lower the level, the closer to the hardware; the higher the level, the more abstract (and presumably, easier to use). I recall that the levels went something like this:

  • High level: Pascal, BASIC, Logo, Fortran
  • Medium level: C, Forth
  • Low level: Assembly language

Keep in mind that these were the same texts which took the time to explain the history of computers from mainframes -> minicomputers -> a relatively recent phenomenon called microcomputers or “PCs”.

Somewhere in the mid-late 1990s, when I was at university, I was introduced to a new tier:

  • Very high level: Perl, shell scripting

I think there was some debate among my peers about whether C++ and Java were properly classified as high or very high level. The distinction between high and very high, in my observation, seemed to be that very high level languages had more complex data structures (at the very least, a hash / dictionary / associative array / key-value map) built into the language, as well as implicit memory management.

Modern Day
These days, the old hierarchy is apparently forgotten (much like minicomputers). I observe that there is generally a much simpler 2-tier classification:

  • Low level: C, assembly language
  • High level: absolutely every other programming language in wide use today

I find myself wondering where C++ and Objective-C fit in this classification scheme. Then I remember that it doesn’t matter and this is all academic.

Relevancy
I think about this because I have pretty much stuck to low-level programming all of my life, mostly due to my interest in game and multimedia-type programming. But the trends in computing have favored many higher level languages and programming paradigms. I woke up one day and realized that the kind of work I often do — lower level stuff — is not very common.

I’m not here to argue that low or high level is superior. You know I’m all about using the appropriate tool for the job. But I sometimes find myself caught between worlds, having the defend and explain one to the other.

  • On one hand, it’s not unusual for the multitudes of programmers working at the high level to gasp and wonder why I or anyone else would ever use C or assembly language for anything when there are so many beautiful high level languages. I patiently explain that those languages have to be written in some other language (at first) and that they need to run on some operating system and that most assuredly won’t be written in a high level language. For further reading, I refer them to Joel Spolsky’s great essay called Back to Basics which describes why it can be useful to know at least a little bit about how the computer does what it does at the lowest levels.
  • On the other hand, believe it or not, I sometimes have to defend the merits of high level languages to my low level brethren. I’ll often hear variations of, “Any program can be written in C. Using a high level language to achieve the same will create a slow and bloated solution.” I try to explain that the trade-off in time to complete the programming task weighed against the often-negligible performance hit of what is often an I/O-bound operation in the first place makes it worthwhile to use the high level language for a wide variety of tasks.

    Or I just ignore them. That’s actually the best strategy.

Posted in Programming | 5 Comments »

Cloaked Archive Wiki

May 15th, 2011 by Multimedia Mike

Google's Chrome browser has made me phenomenally lazy. I don't even attempt to type proper, complete URLs into the address bar anymore. I just type something vaguely related to the address and let the search engine take over. I saw something weird when I used this method to visit Archive Team's site:



There's greater detail when you elect to view more results from the site:



As the administrator of a MediaWiki installation like the one that archiveteam.org runs on, I was a little worried that they might have a spam problem. However, clicking through to any of those out-of-place pages does not indicate anything related to pharmaceuticals. Viewing source also reveals nothing amiss.

I quickly deduced that this is a textbook example of website cloaking. This is when a website reports different content to a search engine than it reports to normal web browsers (humans, presumably). General pseudocode:

C:
  1. if (web_request.user_agent_string == CRAWLER_USER_AGENT)
  2.   return cloaked_data;
  3. else
  4.   return real_data;

You can verify this for yourself using the wget command line utility:


$ wget --quiet --user-agent="Mozilla/5.0" \
http://www.archiveteam.org/index.php?title=Geocities -O - | grep \<title\>
<title>GeoCities - Archiveteam</title>

$ wget --quiet --user-agent="Googlebot/2.1" \
http://www.archiveteam.org/index.php?title=Geocities -O - | grep \<title\>
<title>Cheap xanax | Online Drug Store, Big Discounts</title>

I guess the little web prank worked because the phaux-pharma stuff got indexed. It makes we wonder if there's a MediaWiki plugin that does this automatically.

For extra fun, here's a site called the CloakingDetector which purports to be able to detect whether a page employs cloaking. This is just one humble observer's opinion, but I don't think the site works too well:
Read the rest of this entry »

Posted in General | 2 Comments »

Curator of the Samples Archive

May 12th, 2011 by Multimedia Mike

Remember how I mirrored the world-famous MPlayerHQ samples archive a few months ago? Due to a series of events, the original archive is no longer online. However, me and the people who control the mplayerhq.hu domain figured out how to make samples.mplayerhq.hu point to samples.multimedia.cx.

That means... I'm the current owner and curator of our central multimedia samples repository. Such power! This should probably be the fulfillment of a decade-long dream for me, having managed swaths of the archive, most notably the game formats section.

How This Came To Be

If you pay any attention to the open source multimedia scene, you might have noticed that there has been a smidge of turmoil. Heated words were exchanged, authority was questioned, some people probably said some things they didn't mean, and the upshot is that, where once there was one project (FFmpeg), there are now 2 projects (also Libav). And to everyone who has wanted me to mention it on my blog-- there, I finally broke my silence and formally acknowledged the schism.

For my part, I was just determined to ensure that the samples archive remained online, preferably at the original samples.mplayerhq.hu address. There are 10 years worth of web links out there pointing into the original repository.

Better Solution

I concede that it's not entirely optimal to host the repository here at multimedia.cx. While I can offer a crazy amount of monthly bandwidth, I can't offer rsync (invaluable for keeping mirrors in sync), nor can the server provide anonymous FTP or allow me to offer accounts to other admins who can manage the repository.

The samples archive is also mirrored at samples.libav.org/samples. I understand that service is provided by VideoLAN. Right now, both repositories are known to be static. I'm open to brainstorms about how to improve the situation.

Posted in General | 9 Comments »