Google's John Mueller said on Twitter that even if you try to disallow your robots.txt within your robots.txt, it won't impact how Google processes and accesses that robots.txt. John said in response to someone asking if you can disallow your robots.txt, "it doesn't affect how we process the robots.txt, we'll still process it normally." "However, if someone's linking to your robots.txt file and it would otherwise be indexed, we wouldn't be able to index its content & show it in search (for most sites, that's not interesting anyway)," he added. Meaning, Google might not show it in the Google index. Yes, Google does rank robotst.txt files if it has content people are searching for. John did say this in 2012 so it isn't exactly new information. Here are the new tweets on the topic: Forum discussion at Twitter. … [Read more...] about Disallowing Robots.txt In Robots.txt Doesn’t Impact How Google Processes It
Where is robots txt located
List Of All The GoogleBot Robots.txt Specifications Changes
With Google aiming to make the robots.txt exclusion protocol a standard, they proposed some changes and submitted them the other day. Now, Google updated their own developer docs around the robots.txt specification to match. Here is a list of what has changed.Removed the "Requirements Language" section in this document because the language is Internet draft specific. Robots.txt now accepts all URI-based protocols. Google follows at least five redirect hops. Since there were no rules fetched yet, the redirects are followed for at least five hops and if no robots.txt is found, Google treats it as a 404 for the robots.txt. Handling of logical redirects for the robots.txt file based on HTML content that returns 2xx (frames, JavaScript, or meta refresh-type redirects) is discouraged and the content of the first page is used for finding applicable rules. For 5xx, if the robots.txt is unreachable for more than 30 days, the last cached copy of the robots.txt is used, or if unavailable, Google … [Read more...] about List Of All The GoogleBot Robots.txt Specifications Changes
Google Shares Its Robots.txt Parser Code With Open Source World
Google announced yesterday as part of its efforts to standardizing the robots exclusion protocol that it is open sourcing its robots.txt parser. That means how GoogleBot reads and listens to robots.txt files will be available for any crawler or coder to look at or use.It is rare for Google to share anything they do in core search with the open source world - it is their secret sauce - but here Google has published it to Github for all to access.Google wrote they "open sourced the C++ library that our production systems use for parsing and matching rules in robots.txt files. This library has been around for 20 years and it contains pieces of code that were written in the 90's. Since then, the library evolved; we learned a lot about how webmasters write robots.txt files and corner cases that we had to cover for, and added what we learned over the years also to the internet draft when it made sense." Forum discussion at Twitter. … [Read more...] about Google Shares Its Robots.txt Parser Code With Open Source World
Google: Shared Robots.txt Across Domains Work Okay
Google's John Mueller said on Twitter that having shared robots.txt across multiple domains is fine and should work for search. John wrote "It sounds like you have a shared robots.txt file across domains? That shouldn't be a problem, we might show those cross-domain URLs as errors in Search Console, but if they're on all domains, that should work regardless."So if you are set up to do this, the Google Search Console errors might be a bit unusual but as long as you understand the output and the set up, it should make sense to you.Here are those tweets:It sounds like you have a shared robots.txt file across domains? That shouldn't be a problem, we might show those cross-domain URLs as errors in Search Console, but if they're on all domains, that should work regardless.— John ☆.o(≧▽≦)o.☆ (@JohnMu) May 23, 2018 I have personally never seen an example of a shared robots.txt file set up like this, have you ever done it?Forum discussion at Twitter. … [Read more...] about Google: Shared Robots.txt Across Domains Work Okay
Google Might Change How NoIndex In Robots.txt Works
For years Google has been communicating and we've been reporting that Google does not support using the noindex direction within your robots.txt file. Well, people still use it and now Gary Illyes from Google is on the case - he may end up making sure it completely doesn't work.In short, John was asked about it again and he gave the same answer he has been giving for years: So then Gary Illyes stepped in and said this may go away soon as he reviews the code behind it: Why fully remove support for this? Well, (1) Google has been telling people not to use it and (2) Gary said "Technically, robots.txt is for crawling. The meta tags are for indexing. During indexing they'd be applied at the same stage so there's no good reason to have both of them."Then people said but people use it, don't drop it. So he said he will investigate and if that is true, he will try to make a pitch to keep it: He goes on: I bet the unofficial support does go away at some point - so just don't use it!Forum … [Read more...] about Google Might Change How NoIndex In Robots.txt Works