Category Archives: General

English Phonetic CAPTCHA

Jeff Atwood writes of automated spamming recently in Designing For Evil. The ensuing discussion.presented plenty of technical anti-spam brainstorms as well as the usual violent anti-spammer fantasies. However, one interesting insight I gained from the comment thread concerned the automated nature of Wikipedia’s anti-spam measures:

There is an IRC channel that receives every edit done to Wikiepdia, a bot then check the page for known bad URLs and string and reverts if necessary.

Aha! So it isn’t just a global network of diligent and vigilant volunteer Wikipedians keeping the content clean. That always struck me as largely intractable and learning this punctures the starry-eyed ethos behind the Wiki concept. But I did a little research and it seems to be a real thing.

I suppose something like that would be vast overkill for the MultimediaWiki. As the discussion also details, not all public discussion forums are created equally in terms of attractiveness to spammers, and the MultimediaWiki would probably be pretty far down the list. Some kind of registration CAPTCHA would probably be adequate. And now that I understand a little more about PHP programming thanks to FATE, I may have enough knowledge to try my hand at such a system.

Hey, here’s a CAPTCHA idea that I have entertained: Call it a phonetic CAPTCHA and challenge the user to type in the proper English word with a certain phonetic pronunciation; for example: KAH MEW NIK AY SHUNZ (communications). I was inspired by Infocom’s old Planetfall interactive fiction game where things were labeled phonetically. Perhaps it discriminates against non-native English speakers (and the less educated among the native set) as well as the spambots, but I guess every measure has its pros and cons.

Alternate Subtitles

Kostya recently lamented the matter of subtitle quality. I admit that subtitles are not a topic that I have traditionally cared very deeply about, popular though they may be in the multimedia scene. All the media I care about is generally already in English. Apparently, I’m one of the rare geeks who absolutely detests anime, so I have no reason to care about fansubs for media “imported,” one way or another, from certain Pacific islands.

However, some time ago, I suddenly found a reason to care about subtitles. It turns out that subtitles don’t have to contain bad translations. I’m a huge fan of the old TV show Mystery Science Theater 3000 (a.k.a. MST3K). In a nutshell, the silhouettes of a guy and his 2 robot puppets make fun of rotten movies. They crack an incredibly wide variety of jokes and it’s unlikely anyone can understand every one of them. Leave it to a collaboration of internet geeks to develop an annotation project where users can submit quotes and annotations corresponding to particular timecodes in the lousy movies. These annotations go into a database where they can be downloaded as plaintext .srt subtitle files.


VLC playing MST3K 0904 (Werewolf) with subtitle annotations

“Now what we’re doin’ here, Bob, is gettin’ killed by a werewolf.”

Pictured is an annotation I added for episode 0904 – Werewolf. This is nothing new in the context of DVDs — I remember watching a popup trivia subtitle track on the Spider-Man DVD. But I’m wondering if there are other annotation projects like this one out there on the net for other niche areas of interest.

Git Chat

Actual IM conversation:

[me]: ever use git?
[them]: Why would I do such a thing?
[me]: peer pressure, because all of your co-devs 
      told you it was the cool thing to do

I thought that developing the software that drives FATE would serve as a good opportunity to learn the Git source control software. Foolish.

Git terrifies me. Thing is, I make mistakes. Lots of mistakes. I need a source control management system (SCM) that is sympathetic to my incompetence. As it stands, when I make a mistake, I have to dig through 140 git-* commands on my system to try to guess which one just might offer a shimmering hope of redemption. If I choose poorly, I will only exacerbate the situation as well as pollute the official history log. Such was the case when I tried to revert one particular commit. I can’t remember how that worked out exactly. I guess I got the correct code back eventually, but the log file tells a sordid tale.

More recently, I edited a file but decided I didn’t want the changes; I wanted the previous committed version back. Perhaps use git-revert, like most other SCMs? Goodness, no. Maybe git-reset? Guess again. Turns out git-checkout is what I was looking for (thanks, Mans). Now, I have made the mistake of using git-commit in such a way that actually committed more files than I thought it would (serves me right for following examples and not reading the pedantic documentation first). Now I find myself wanting to undo the commit for one particular file but not actually lose the changes.

Here’s a solution that can’t fail: ‘rm -rf .git/’, followed by a re-reading of how to initialize a local Subversion repository. And whose idea was it to tag revisions with random 160-bit hex codes like 488dfe6a946bbbbb4e095a5d758ad9808f7336b1? (Yeah, I know, they’re SHA-1 codes or some such. I don’t care; it’s still not human-friendly). I hope FFmpeg never gets around to making the switch.