{"id":832,"date":"2009-01-06T21:30:23","date_gmt":"2009-01-07T05:30:23","guid":{"rendered":"http:\/\/multimedia.cx\/eggs\/?p=832"},"modified":"2009-01-06T21:31:21","modified_gmt":"2009-01-07T05:31:21","slug":"this-is-what-i-was-trying-to-avoid","status":"publish","type":"post","link":"https:\/\/multimedia.cx\/eggs\/this-is-what-i-was-trying-to-avoid\/","title":{"rendered":"This Is What I Was Trying To Avoid"},"content":{"rendered":"<p>I checked my website bandwidth overview tonight. So far for the month of January, the bandwidth served from the main multimedia.cx domain is actually much higher than the bandwidth served by <a href=\"http:\/\/games.multimedia.cx\/\">my gaming blog<\/a>, which never happens (lots more pictures over there). I dug a little deeper into the details and found this:<\/p>\n<p><center><br \/>\n<img decoding=\"async\" src=\"\/eggs\/images\/googlebot-crawling-havoc.png\" alt=\"Googlebot causing bandwidth havoc\" \/><br \/>\n<\/center><\/p>\n<p>So who is 66.249.67.1? Why, none other than crawl-66-249-67-1.googlebot.com. Why has it taken such an interest in my site? Oh, little pages like this:<\/p>\n<p>66.249.67.1 &#8211; &#8211; [01\/Jan\/2009:00:58:01 -0500] &#8220;GET \/fate\/index.php?stderr=41851 HTTP\/1.1&#8221; 200 69107 &#8220;-&#8221; &#8220;Mozilla\/5.0 (compatible; Googlebot\/2.1; +http:\/\/www.google.com\/bot.html)&#8221;<\/p>\n<p>66.249.67.1 &#8211; &#8211; [01\/Jan\/2009:00:58:06 -0500] &#8220;GET \/fate\/index.php?build_record=43652 HTTP\/1.1&#8221; 200 3297 &#8220;-&#8221; &#8220;Mozilla\/5.0 (compatible; Googlebot\/2.1; +http:\/\/www.google.com\/bot.html)&#8221;<\/p>\n<p>You see, I thought I had administered my <a href=\"http:\/\/fate.multimedia.cx\/\">FATE web database<\/a> responsibly by adding the appropriate <a href=\"http:\/\/en.wikipedia.org\/wiki\/Robots.txt\">robots exclusion file<\/a> at <a href=\"http:\/\/fate.multimedia.cx\/robots.txt\">http:\/\/fate.multimedia.cx\/robots.txt<\/a> by simply disallowing crawlers at this point. I completely neglected that <a href=\"http:\/\/multimedia.cx\/fate\/\">http:\/\/multimedia.cx\/fate\/<\/a> is a perfectly valid route into the site.<\/p>\n<p><a href=\"http:\/\/multimedia.cx\/robots.txt\">Lesson learned<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I learned the hard way that web sites have multiple routes of entry<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[101],"tags":[],"class_list":["post-832","post","type-post","status-publish","format-standard","hentry","category-fate-server"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/832","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/comments?post=832"}],"version-history":[{"count":4,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/832\/revisions"}],"predecessor-version":[{"id":836,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/832\/revisions\/836"}],"wp:attachment":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/media?parent=832"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/categories?post=832"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/tags?post=832"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}