The Robots Exclusion Protocol (REP) is not exactly a complicated protocol and its uses are fairly limited, and thus it’s usually given short shrift by SEOs. Yet there’s a lot more to it than you might think. Robots.txt has been with us for over 14 years, but how many of us knew that in addition to the disallow directive there’s a noindex directive that Googlebot obeys? That noindexed pages don’t end up in the index but disallowed pages do, and the latter can show up in the search results (albeit with less information since the spiders can’t see the page content)? That disallowed pages still accumulate PageRank? That robots.txt can accept a limited form of pattern matching? That, because of that last feature, you can selectively disallow not just directories but also particular filetypes (well, file extensions to be more exact)? That a robots.txt disallowed page can’t be accessed by the spiders, so they can’t read and obey a meta robots tag contained within the page? A robots.txt file provides critical information for search engine spiders that crawl the web. Before these bots (does anyone say the full word “robots” anymore?) access pages of a site, they check to see…
- Google: Is robots.txt really a copyright infringement defense?
- Google Base S2 added to robots.txt
- Finally, here's the 'Jetsons' looking robot you've always wanted
- Microsoft's Surface: A deeper look
- A deeper look at Clark's shiny new Internet
- No, we won’t have a Blade Runner future: The robots of tomorrow won’t look like humans — we’re too inefficient
- Take a look inside the UK’s leading humanoid robot factory where Westworld-style droids come to life
- DIY Dragon Robot Flies in to Teach Kids Code
- How Machine Learning Lets Robots Teach Themselves
- Twitter, search robots get welcomes from Obama White House
A Deeper Look At Robots.txt have 325 words, post on at April 15, 2009. This is cached page on SEO. If you want remove this page, please contact us.