share tweet share pin it e-mail share Why should search engines influence indexing? There’s a variety of reasons to control indexing and thus to dictate how a search engine should deal with websites and links: Allow or disallow following links Prevent indexing of irrelevant websites Index duplicate content under only one URL The goal, of course, is to deliver only relevant HTML pages to the engine. But this doesn’t always happen properly. Duplicate content quickly occurs due to technical problems or the ubiquitous ‘human factor‘, which is is all to common. But there are ways to keep an index clean and counteract this. Which methods work? I will be covering 3 methods for influencing the indexing for your site. Which ones these are and how they can be used. /Robots.txt protocol The /robots.txt is like a ‘bouncer‘ for search engine crawlers. It explicitly allows which crawlers may search which pages/sections on a domain. … [Read more...] about SEO Basics – Indexing with / robots.txt, meta tags and canonicals –
Robots txt x robots tag
Understanding the difference between the robots.txt file and Robots Tag is critical for search engine optimization and security. It can have a profound impact on the privacy of your website and customers as well. The first thing to know is what robots.txt files and Robots Tags are. Robots.txt Robots.txt is a file you place in your website’s top level directory, the same folder in which a static homepage would go. Inside robots.txt, you can instruct search engines to not crawl content by disallowing file names or directories. There are two parts to a robots.txt directive, the user-agent and one or more disallow instructions. The user-agent specifies one or all Web crawlers or spiders. When we think of Web crawlers we tend to think Google and Bing; however, a spider can come from anywhere, not just search engines, and there are many of them crawling the Internet. Here is a simple robots.txt file telling all Web crawlers that it is okay to spider every page: User-agent: * Disallow: … [Read more...] about Have You Considered Privacy Issues When Using Robots.txt & The Robots Meta Tag?
The Robots Exclusion Protocol (REP) is not exactly a complicated protocol and its uses are fairly limited, and thus it’s usually given short shrift by SEOs. Yet there’s a lot more to it than you might think. Robots.txt has been with us for over 14 years, but how many of us knew that in addition to the disallow directive there’s a noindex directive that Googlebot obeys? That noindexed pages don’t end up in the index but disallowed pages do, and the latter can show up in the search results (albeit with less information since the spiders can’t see the page content)? That disallowed pages still accumulate PageRank? That robots.txt can accept a limited form of pattern matching? That, because of that last feature, you can selectively disallow not just directories but also particular filetypes (well, file extensions to be more exact)? That a robots.txt disallowed page can’t be accessed by the spiders, so they can’t read and obey a meta robots tag … [Read more...] about A Deeper Look At Robots.txt
Google's John Mueller said on Twitter that having shared robots.txt across multiple domains is fine and should work for search. John wrote "It sounds like you have a shared robots.txt file across domains? That shouldn't be a problem, we might show those cross-domain URLs as errors in Search Console, but if they're on all domains, that should work regardless."So if you are set up to do this, the Google Search Console errors might be a bit unusual but as long as you understand the output and the set up, it should make sense to you.Here are those tweets:It sounds like you have a shared robots.txt file across domains? That shouldn't be a problem, we might show those cross-domain URLs as errors in Search Console, but if they're on all domains, that should work regardless.— John ☆.o(≧▽≦)o.☆ (@JohnMu) May 23, 2018 I have personally never seen an example of a shared robots.txt file set up like this, have you ever done it?Forum discussion at Twitter. … [Read more...] about Google: Shared Robots.txt Across Domains Work Okay
Google's John Mueller said on Twitter that even if you try to disallow your robots.txt within your robots.txt, it won't impact how Google processes and accesses that robots.txt. John said in response to someone asking if you can disallow your robots.txt, "it doesn't affect how we process the robots.txt, we'll still process it normally." "However, if someone's linking to your robots.txt file and it would otherwise be indexed, we wouldn't be able to index its content & show it in search (for most sites, that's not interesting anyway)," he added. Meaning, Google might not show it in the Google index. Yes, Google does rank robotst.txt files if it has content people are searching for. John did say this in 2012 so it isn't exactly new information. Here are the new tweets on the topic: Forum discussion at Twitter. … [Read more...] about Disallowing Robots.txt In Robots.txt Doesn’t Impact How Google Processes It