There’s No Stopping Bad Behavior A problem you will have with both robots.txt and the robots tag is that these instructions cannot enforce their directives. While Google and Bing will certainly respect your instructions, someone using Screaming Frog, Xenu, or their own custom site crawler can simply ignore disallow and noindex directives. … [Read more...] about Have You Considered Privacy Issues When Using Robots.txt & The Robots Meta Tag?
Where is robots txt located
Business models are changing, and publishers need a protocol to express permissions of access and use that is flexible and extensible as new business models arise. ACAP will be entirely agnostic with respect to business models, but will ensure that revenues can be distributed appropriately. ACAP presents a win win for the whole online publishing community with the promise of more high quality content and more innovation and investment in the online publishing sector. ACAP is for the large as well as the small and even the individuals. It will benefit all content providers whether they are working alone or through publishers. A future without publishers willing and able to invest in high quality content and get a return on that investment is a future without high-quality content on the net. … [Read more...] about ACAP Versus Robots.txt For Controlling Search Engines
Pages you block by using robots.txt disallows may still be in Google’s index and appear in the search results — especially if other sites link to them. Granted, a high ranking is pretty unlikely since Google can’t “see” the page content; it has very little to go on other than the anchor text of inbound and internal links, and the URL (and the ODP title and description if in ODP/DMOZ.) As a result, the URL of the page and, potentially, other publicly available information can appear in search results. However, no content from your pages will be crawled, indexed or displayed. … [Read more...] about A Deeper Look At Robots.txt
Our goal is to come out with clear information about the actual support around REP for all engines. We have all separately at different times reported our support and this creates a long trail hard for anyone to put together. Posting the same spec at the same time provides a sync point for everyone as to the actual similarities or differences between our implementations for all engines. We are trying to address the latent concerns around differences across the engines. … [Read more...] about Yahoo!, Google, Microsoft Clarify Robots.txt Support
A side note worth mentioning was the discussion where it might be appropriate to tell the engines to not index duplicate content areas, or possible spider traps like session IDs and affiliate IDs. However, it may not be appropriate to disallow engines from indexing style sheets used to format a web page. Dan Crow explained that blocking access to indexing the CSS might give the appearance that you were abusing the CSS so he did not recommend disallowing engines from indexing your CSS file. … [Read more...] about Up Close & Personal With Robots.txt