{"id":3306,"date":"2011-05-15T16:21:53","date_gmt":"2011-05-15T23:21:53","guid":{"rendered":"http:\/\/multimedia.cx\/eggs\/?p=3306"},"modified":"2020-07-26T00:02:43","modified_gmt":"2020-07-26T07:02:43","slug":"cloaked-archive-wiki","status":"publish","type":"post","link":"https:\/\/multimedia.cx\/eggs\/cloaked-archive-wiki\/","title":{"rendered":"Cloaked Archive Wiki"},"content":{"rendered":"<p>Google&#8217;s Chrome browser has made me phenomenally lazy. I don&#8217;t even attempt to type proper, complete URLs into the address bar anymore. I just type something vaguely related to the address and let the search engine take over. I saw something weird when I used this method to visit <a href=\"http:\/\/archiveteam.org\">Archive Team&#8217;s site<\/a>:<\/p>\n<p><center><br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2011\/05\/archive-team-google.png\" alt=\"\" title=\"Archive Team -- odd Google results\" width=\"342\" height=\"209\" class=\"aligncenter size-full wp-image-3307\" srcset=\"https:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2011\/05\/archive-team-google.png 342w, https:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2011\/05\/archive-team-google-300x183.png 300w\" sizes=\"auto, (max-width: 342px) 100vw, 342px\" \/><br \/>\n<\/center><\/p>\n<p>There&#8217;s greater detail when you elect to view more results from the site:<\/p>\n<p><center><br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2011\/05\/archive-team-cloak.png\" alt=\"\" title=\"Archive Team -- more cloaked results\" width=\"533\" height=\"191\" class=\"aligncenter size-full wp-image-3308\" srcset=\"https:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2011\/05\/archive-team-cloak.png 533w, https:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2011\/05\/archive-team-cloak-300x107.png 300w\" sizes=\"auto, (max-width: 533px) 100vw, 533px\" \/><br \/>\n<\/center><\/p>\n<p>As the administrator of <a href=\"http:\/\/wiki.multimedia.cx\">a MediaWiki installation<\/a> like the one that archiveteam.org runs on, I was a little worried that they might have a spam problem. However, clicking through to any of those out-of-place pages does not indicate anything related to pharmaceuticals. Viewing source also reveals nothing amiss.<\/p>\n<p>I quickly deduced that this is a textbook example of <a href=\"http:\/\/en.wikipedia.org\/wiki\/Cloaking\">website cloaking<\/a>. This is when a website reports different content to a search engine than it reports to normal web browsers (humans, presumably). General pseudocode:<\/p>\n<pre>\r\nif (web_request.user_agent_string == CRAWLER_USER_AGENT)\r\n  return cloaked_data;\r\nelse\r\n  return real_data;\r\n<\/pre>\n<p>You can verify this for yourself using the <code>wget<\/code> command line utility:<\/p>\n<p><code><br \/>\n$ wget --quiet --user-agent=\"<strong>Mozilla\/5.0<\/strong>\" \\<br \/>\n  http:\/\/www.archiveteam.org\/index.php?title=Geocities -O - | grep \\&lt;title\\&gt;<br \/>\n&lt;title&gt;GeoCities - Archiveteam&lt;\/title&gt;<\/p>\n<p>$ wget --quiet --user-agent=\"<strong>Googlebot\/2.1<\/strong>\" \\<br \/>\n  http:\/\/www.archiveteam.org\/index.php?title=Geocities -O - | grep \\&lt;title\\&gt;<br \/>\n&lt;title&gt;Cheap xanax | Online Drug Store, Big Discounts&lt;\/title&gt;<br \/>\n<\/code><\/p>\n<p>I guess the little web prank worked because the phaux-pharma stuff got indexed. It makes we wonder if there&#8217;s a MediaWiki plugin that does this automatically.<\/p>\n<p>For extra fun, <a href=\"http:\/\/www.cloakingdetector.com\/\">here&#8217;s a site called the CloakingDetector<\/a> which purports to be able to detect whether a page employs cloaking. This is just one humble observer&#8217;s opinion, but I don&#8217;t think the site works too well:<br \/>\n<!--more--><br \/>\n<center><br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2011\/05\/cloaking-detector.png\" alt=\"\" title=\"Cloaking Detector website\" width=\"558\" height=\"394\" class=\"aligncenter size-full wp-image-3309\" srcset=\"https:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2011\/05\/cloaking-detector.png 558w, https:\/\/multimedia.cx\/eggs\/wp-content\/uploads\/2011\/05\/cloaking-detector-300x211.png 300w\" sizes=\"auto, (max-width: 558px) 100vw, 558px\" \/><br \/>\n<\/center><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is what website cloaking looks like<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-3306","post","type-post","status-publish","format-standard","hentry","category-general"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/3306","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/comments?post=3306"}],"version-history":[{"count":9,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/3306\/revisions"}],"predecessor-version":[{"id":4655,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/3306\/revisions\/4655"}],"wp:attachment":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/media?parent=3306"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/categories?post=3306"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/tags?post=3306"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}