"Preserve," with similar time limits available for "index," would stipulate whether a copy may be stored in a search engine’s cache. … [Read more...] about ACAP Launches, Robots.txt 2.0 For Blocking Search Engines?
Robot txt sitemap
#Nothing interesting to see here, but there is a dance party #happening over here: http://www.youtube.com/watch?v=9vwZ5FQEUFg User-agent: * Disallow: /api/user?* Disallow: Sitemap: http://www.seomoz.org/blog-sitemap.xml Sitemap: http://www.seomoz.org/ugc-sitemap.xml Sitemap: http://www.seomoz.org/profiles-sitemap.xml Sitemap: http://app.wistia.com/sitemaps/2.xml If you are unfamiliar with robots.txt, be sure to read these pages: … [Read more...] about Have You Considered Privacy Issues When Using Robots.txt & The Robots Meta Tag?
Cloaking: Those savvy to search engines know that Google hates cloaking, which is the act of showing a search engine something different than a human being would see. It’s often associated with spam. There are plenty of cases where people have shown misleading content to a search engine, in hopes of getting a good ranking. One example is from 1999, when the FTC took action against a site that was cloaking content that ranked for “innocent” searches like Oklahoma tornadoes and instead directed them to porn sites. The idea of a publisher forcing a search engine to allow cloaking would be somewhat similar to a newspaper being forced to write whatever a subject demanded be written about them. … [Read more...] about ACAP Versus Robots.txt For Controlling Search Engines
Pages you block by using robots.txt disallows may still be in Google’s index and appear in the search results — especially if other sites link to them. Granted, a high ranking is pretty unlikely since Google can’t “see” the page content; it has very little to go on other than the anchor text of inbound and internal links, and the URL (and the ODP title and description if in ODP/DMOZ.) As a result, the URL of the page and, potentially, other publicly available information can appear in search results. However, no content from your pages will be crawled, indexed or displayed. … [Read more...] about A Deeper Look At Robots.txt