Google's John Mueller said on Twitter that having shared robots.txt across multiple domains is fine and should work for search. John wrote "It sounds like you have a shared robots.txt file across domains? That shouldn't be a problem, we might show those cross-domain URLs as errors in Search Console, but if they're on all domains, that should work regardless." … [Read more...] about Google: Shared Robots.txt Across Domains Work Okay
Robot txt example
With Google aiming to make the robots.txt exclusion protocol a standard, they proposed some changes and submitted them the other day. Now, Google updated their own developer docs around the robots.txt specification to match. Here is a list of what has changed. … [Read more...] about List Of All The GoogleBot Robots.txt Specifications Changes
Robots.txt Robots.txt is a file you place in your website’s top level directory, the same folder in which a static homepage would go. Inside robots.txt, you can instruct search engines to not crawl content by disallowing file names or directories. There are two parts to a robots.txt directive, the user-agent and one or more disallow instructions. … [Read more...] about Have You Considered Privacy Issues When Using Robots.txt & The Robots Meta Tag?
Urgent Removal: If you’re a site owner, a system to get pages out of an index in a guaranteed period of time would be very convenient. However, this is probably better handled through webmaster tools that the search engines offer, as they allow a site owner to proactively trigger a removal, rather than waiting for visit from a crawler, which could take days. Ironically, at Google, they had a system to remove pages quickly. I wrote about it two years ago (see Google Releases Improved Content Removal Tools). But the documentation today is terrible. Little is explained if you’re not logged in. If you are logged in, the link for the webmaster version doesn’t work. The entire feature Google described in 2007 is gone. … [Read more...] about ACAP Versus Robots.txt For Controlling Search Engines
Pages you block by using robots.txt disallows may still be in Google’s index and appear in the search results — especially if other sites link to them. Granted, a high ranking is pretty unlikely since Google can’t “see” the page content; it has very little to go on other than the anchor text of inbound and internal links, and the URL (and the ODP title and description if in ODP/DMOZ.) As a result, the URL of the page and, potentially, other publicly available information can appear in search results. However, no content from your pages will be crawled, indexed or displayed. … [Read more...] about A Deeper Look At Robots.txt