Author Archives: Multimedia Mike

Cloaked Archive Wiki

Google’s Chrome browser has made me phenomenally lazy. I don’t even attempt to type proper, complete URLs into the address bar anymore. I just type something vaguely related to the address and let the search engine take over. I saw something weird when I used this method to visit Archive Team’s site:



There’s greater detail when you elect to view more results from the site:



As the administrator of a MediaWiki installation like the one that archiveteam.org runs on, I was a little worried that they might have a spam problem. However, clicking through to any of those out-of-place pages does not indicate anything related to pharmaceuticals. Viewing source also reveals nothing amiss.

I quickly deduced that this is a textbook example of website cloaking. This is when a website reports different content to a search engine than it reports to normal web browsers (humans, presumably). General pseudocode:

if (web_request.user_agent_string == CRAWLER_USER_AGENT)
  return cloaked_data;
else
  return real_data;

You can verify this for yourself using the wget command line utility:


$ wget --quiet --user-agent="Mozilla/5.0" \
http://www.archiveteam.org/index.php?title=Geocities -O - | grep \<title\>
<title>GeoCities - Archiveteam</title>

$ wget --quiet --user-agent="Googlebot/2.1" \
http://www.archiveteam.org/index.php?title=Geocities -O - | grep \<title\>
<title>Cheap xanax | Online Drug Store, Big Discounts</title>

I guess the little web prank worked because the phaux-pharma stuff got indexed. It makes we wonder if there’s a MediaWiki plugin that does this automatically.

For extra fun, here’s a site called the CloakingDetector which purports to be able to detect whether a page employs cloaking. This is just one humble observer’s opinion, but I don’t think the site works too well:
Continue reading

Curator of the Samples Archive

Remember how I mirrored the world-famous MPlayerHQ samples archive a few months ago? Due to a series of events, the original archive is no longer online. However, me and the people who control the mplayerhq.hu domain figured out how to make samples.mplayerhq.hu point to samples.multimedia.cx.

That means… I’m the current owner and curator of our central multimedia samples repository. Such power! This should probably be the fulfillment of a decade-long dream for me, having managed swaths of the archive, most notably the game formats section.

How This Came To Be

If you pay any attention to the open source multimedia scene, you might have noticed that there has been a smidge of turmoil. Heated words were exchanged, authority was questioned, some people probably said some things they didn’t mean, and the upshot is that, where once there was one project (FFmpeg), there are now 2 projects (also Libav). And to everyone who has wanted me to mention it on my blog– there, I finally broke my silence and formally acknowledged the schism.

For my part, I was just determined to ensure that the samples archive remained online, preferably at the original samples.mplayerhq.hu address. There are 10 years worth of web links out there pointing into the original repository.

Better Solution

I concede that it’s not entirely optimal to host the repository here at multimedia.cx. While I can offer a crazy amount of monthly bandwidth, I can’t offer rsync (invaluable for keeping mirrors in sync), nor can the server provide anonymous FTP or allow me to offer accounts to other admins who can manage the repository.

The samples archive is also mirrored at samples.libav.org/samples. I understand that service is provided by VideoLAN. Right now, both repositories are known to be static. I’m open to brainstorms about how to improve the situation.

Creating A Lossless SMC Encoder

Look, I can’t explain how or why I come up with this stuff. For some reason, I thought it would be interesting to write a new encoder for the Apple SMC video codec. I can’t even remember why. I just sat down the other day, started writing, and now I have a lossless SMC encoder that I’m not sure what to do with. Maybe this is to be my new thing— writing encoders for marginal multimedia formats.

Introduction
SMC is a vector quantizer (a lossy method) but I decided to attack it from the angle of lossless encoding. A.k.a. Apple Graphics Codec, SMC operates on 4×4 blocks in an 8-bit paletted colorspace. Each 4×4 block can be encoded with 1, 2, 4, 8, or 16 colors. Blocks can also be skipped (copied from previous frame) or copied from blocks rendered immediately prior within the same frame.

Step 1: Validating Infrastructure
The goal of this step is to encode the most braindead SMC frame possible and see if FFmpeg/libav’s QuickTime muxer can create a valid file. I think the simplest frame would be one in which each vector is encoded with the single-color mode, starting with color 0 and incrementing through the palette.

Status: Successful. The only ‘trick’ was to set avctx->bits_per_coded_sample to 8. (For fun, this can also be set to 40 (8 | 0x20) to specify a grayscale palette.)



Step 2: Preprocessing
The video frames will arrive at the encoder as 32-bit RGB. These will need to be converted to a paletted colorspace before encoding. I don’t want to use FFmpeg’s default dithering approach as this will result in a substantial loss of quality as described in this post. I would rather maintain a palette built from observed colors throughout successive frames. If the total number of unique observed colors ever exceeds 256, error out.

That’s what I would like to do. However, I noticed that FFmpeg/libav’s QuickTime muxer has never taken into account the possibility of encoding palettes. The path of least resistance in this case is to dither the input to match QuickTime’s default 8-bit palette (if a paletted QuickTime file does not specify a palette, a default 1-, 2-, 4-, or 8-bit palette is selected).

Status: Successful, if slow. I definitely need to optimize this step later.

Step 3: Most Naive Encoding
The most basic encoding is to “encode” each block as a 16-color block. This will actually result in a slightly larger frame size than a raw encoding since each 4×4 block will be prepended by a byte opcode (0xE0 in this case) to indicate encoding mode. This should demonstrate that the encoder is functioning at the most basic level.

Status: Successful. Try not to laugh too hard at the Big Buck Bunny dithered to an 8-bit palette:



Step 4: Better Representation Continue reading

Removing GRUB

I have a Windows/Linux dual-booting computer that I don’t want to be dual boot anymore– the Linux part needs to go. Thus, the GRUB bootloader needs to be removed so that Windows boots normally. I found lots of tips around the internet about how to do this. Of course, none of them worked. So I must add to the general body of knowledge.

I found tips that described how to manually remove GRUB via Linux– by using 'dd' to overwrite no more than 446 sectors of the boot disk with zeros. This strikes me as a dangerous and unstable proposition. It also wasn’t an option since I had already opted to reformat the formerly Linux partition via the Windows CD-ROM before I endeavored to remove the bootloader.

Other forums and sites mentioned a combination of utilities found on the Windows CD-ROM including FIXBOOT, FIXMBR, and BOOTCFG. While these programs performed some functions, they didn’t achieve the desired effect– to make Windows boot automatically.

New idea: Repartition the disk such that there is a (relatively) tiny extra partition. Then, well… reinstall Linux. I used a 4 GB partition and Ubuntu 10.10 and let it run its course which ended with installing GRUB… again.

Seems roundabout– installing Linux specifically to boot into Windows. But it works.