Overall, there are some ideas in ACAP that would be useful for the search engines to consider. However, there are many ideas outside of ACAP that would also be useful for them to consider. There’s nothing I see within ACAP that provides some type of crucial control that if only news publishers had, all their online woes would be over. What the news publishers really want are licensing agreements, and given that Google already has several of these without using ACAP (see Josh Cohen Of Google News On Paywalls, Partnerships & Working With Publishers), I can’t see that having it somehow advances any business model changes. … [Read more...] about ACAP Versus Robots.txt For Controlling Search Engines
Sitemap url in robots txt
The Robots Exclusion Protocol (REP) is not exactly a complicated protocol and its uses are fairly limited, and thus it’s usually given short shrift by SEOs. Yet there’s a lot more to it than you might think. Robots.txt has been with us for over 14 years, but how many of us knew that in addition to the disallow directive there’s a noindex directive that Googlebot obeys? That noindexed pages don’t end up in the index but disallowed pages do, and the latter can show up in the search results (albeit with less information since the spiders can’t see the page content)? That disallowed pages still accumulate PageRank? That robots.txt can accept a limited form of pattern matching? That, because of that last feature, you can selectively disallow not just directories but also particular filetypes (well, file extensions to be more exact)? That a robots.txt disallowed page can’t be accessed by the spiders, so they can’t read and obey a meta robots tag … [Read more...] about A Deeper Look At Robots.txt
to go at the root level of a web site. If you don’t put them there, then you … [Read more...] about Google Offers Robots.txt Generator
"Preserve," with similar time limits available for "index," would stipulate whether a copy may be stored in a search engine’s cache. … [Read more...] about ACAP Launches, Robots.txt 2.0 For Blocking Search Engines?
A side note worth mentioning was the discussion where it might be appropriate to tell the engines to not index duplicate content areas, or possible spider traps like session IDs and affiliate IDs. However, it may not be appropriate to disallow engines from indexing style sheets used to format a web page. Dan Crow explained that blocking access to indexing the CSS might give the appearance that you were abusing the CSS so he did not recommend disallowing engines from indexing your CSS file. … [Read more...] about Up Close & Personal With Robots.txt