Category Archives: General

Wiki Counterspam

A brief digression: At a frequency of roughly once every 2 days, the MultimediaWiki sustains a drive-by spamming attack. It usually takes 2-3 minutes to clean up, although one morning I woke up to a massive spam attack that took me hours to revert; that’s what prompted me to enforce user registration. What strikes me is how much more serious this problem could possibly be. I occasionally get so annoyed that I investigate MediaWiki’s anti-spam features.

Second-order digression: If you think it’s hard to find good documentation on FFmpeg, try finding the documentation you need for a Wiki package, which is — in the time-honored tradition of eating one’s own dog food — all in Wiki form. Why is this a problem? It just feels so… “squishy”. It’s not all there, it’s always in flux, it can give you a general idea of what you want to know but never feels authoritative– the same controversial points as, for example, Wikipedia. In fact, my first encounter with the Wiki paradigm was the online documentation for some open source program or another. They constructed a Wiki outline and expected users to fill it in. That experience gave me a serious aversion to Wiki for a long time to come. That said, would it be hypocritical for me to mention that I very much want to set up a Wiki-based knowledge base for FFmpeg users and developers?

I have watched the email spam arms race with much interest for many years. I am fascinated by the technical challenges involved and the solutions proposed, each with its pros and cons. Every proposed measure could be thwarted with enough effort. A few years ago, Bayesian filtering caught on and it always struck me as the tactical nuclear weapon of spam filtering. It did a lot to solve the problem on the client side (though counter measures at various levels of the email network help matters).

Then blogs, with comments, and Wikis gained prevalence. The spam problem started all over again. What I can’t seem to understand is why people fighting the good fight on this new frontier have chosen to start the arms race from square one by banging at the problem with rocks instead of going straight to the nukes. I’m wondering why there aren’t any Bayesian solutions in the Wiki space. (Thankfully, it appears that there are Bayesian comment filtering plugins available for, at least, WordPress). How would it work? Perhaps initialize it by claiming that the entire set of existing pages is valid and then allow administrators to mark certain pages as spam, or certain users as known spammers. When an edit is submitted the Wiki runs the edit through the filter to determine if it “looks” like spam and rejects it. However, one of the underlying operating principles of the Bayesian method as applied to email is that every user’s mailbox looks very different than everyone else’s. A spammer would require knowledge of an individual mailbox in order to reliably thwart the filtering. Unfortunately, the “mailbox”, or body of messages, in this case would be unified and public. This would afford a spammer an ergonomic, interactive environment by which to test spams by dumping in the text of valid pages and tweaking them with spammy URLs until the pages get through.

Okay, so maybe the idea isn’t that straightforward after all. Forget I even brought it up.

Through it all, though, I still stand by the Wiki paradigm.

I Have More Games Than You

Well, probably. In my quest for weird and wacky multimedia samples I have amassed quite a collection of game titles. The number is well in excess of 400 right now. Not a hardcore collector by any stretch of the imagination. But it’s more than enough for any member of the opposite sex to write me off as an emotionally stunted, overgrown adolescent, despite the fact that I rarely actually play any of these games.

I catalog the games’ multimedia technology in my Multimedia Exploration Journal (and I presently have 70-80 games to process for the journal). If you care what all the games are, here is the master spreadsheet I maintain:

Mostly, the spreadsheet is to help track which information still needs to go into MobyGames. I have not updated the spreadsheet in a little while so some of the red and yellow cells might be a little inaccurate w.r.t. the database.


Joystick

Do you have any dusty old games stuffed in the closet? Do you have at least mild obsessive-compulsive tendencies? Check the MobyGames database and see if you have anything to contribute with those old games– box and media scans, game screenshots, reviews, the game itself if necessary– I have dozens of titles that still don’t exist in the database, expansive though it may be (27,478 games as of this post).

The New FourCC Authority?

One of my original motivations which led to the creation of the MultimediaWiki was to expand on the knowledge enumerated at fourcc.org, heretofore the internet’s foremost authority on the curious multimedia concept known as the four-character code (FourCC). With the latest update of the MediaWiki software, I find that I am able to categorize FourCC redirect pages. What this means to the Wiki lay person is that I can use the Wiki to automatically maintain a list of all known video FourCCs, which I have done. I have done the same for audio FourCCs, though the list is not as extensive (mostly applies to QuickTime and Real codecs). I hope to give the same treatment to Microsoft 16-bit audio IDs soon.

So far, I have only catalogued the FourCCs and codecs that I can prove exist, either because we have samples, codecs, or both that correspond to the codec. The fourcc.org list contains dozens of FourCCs for which I can find no samples or codecs. It’s reasonable to believe that they existed, perhaps at the dawn of the consumer multimedia era. It also could be that certain FourCCs were formally registered with Microsoft by ambitious companies that were never able to release their multimedia programs that would have generated the corresponding data.

What to do about these? I don’t wish categorize them along with the provable FourCCs. I may create a different page or category for these strays until they can be claimed either by the discovery of actual media samples in the wild or by codecs (source or binary, coders or decoders) that can handle the data.

MySQL Disaster Recovery Works

I just want to briefly recognize MySQL for its resilience and robustness. Even in the event that a table gets corrupted, as happened today with the tables driving the MultimediaWiki, ‘CHECK TABLE’ and ‘REPAIR TABLE’ can diagnose and fix the problem and have your MySQL-backed application up and running in short order. Of course, I also have an automated script backing up the database tables off-site, every night, just for good measure.


MySQL Logo

I just want any current or future contributors to be secure in the knowledge that the data is safe.