In the battle between search engines and some mainstream news publishers, ACAP has been lurking for several years. ACAP — the Automated Content Access Protocol — has constantly been positioned by some news executives as a cornerstone to reestablishing the control they feel has been lost over their content. However, the reality is that publishers have more control even without ACAP than is commonly believed by some. In addition, ACAP currently provides no “DRM” or licensing mechanisms over news content. But the system does offer some ideas well worth considering. Below, a look at how it measures up against the current systems for controlling search engines. ACAP started development in 2006 and formally launched a year later with version 1.0 (see ACAP Launches, Robots.txt 2.0 For Blocking Search Engines?). This year, in October, ACAP 1.1 was released and has been installed by over 1,250 publishers worldwide, says the organization, which is backed by the European … [Read more...] about ACAP Versus Robots.txt For Controlling Search Engines
Sitemap url in robots txt
The Robots Exclusion Protocol (REP) is not exactly a complicated protocol and its uses are fairly limited, and thus it’s usually given short shrift by SEOs. Yet there’s a lot more to it than you might think. Robots.txt has been with us for over 14 years, but how many of us knew that in addition to the disallow directive there’s a noindex directive that Googlebot obeys? That noindexed pages don’t end up in the index but disallowed pages do, and the latter can show up in the search results (albeit with less information since the spiders can’t see the page content)? That disallowed pages still accumulate PageRank? That robots.txt can accept a limited form of pattern matching? That, because of that last feature, you can selectively disallow not just directories but also particular filetypes (well, file extensions to be more exact)? That a robots.txt disallowed page can’t be accessed by the spiders, so they can’t read and obey a meta robots tag … [Read more...] about A Deeper Look At Robots.txt
Google’srolled out a new tool at GoogleWebmaster Central, a robots.txt generator. It’s designed to allow siteowners to easily create a robots.txt file, one of the two main ways (along withthe meta robots tag)to prevent search engines from indexing content. Robots.txt generators aren’tnew. You can find many of them out there by searching. But this is the firsttime a major search engine has provided a generator tool of its own. It’s nice to see the addition. Robots.txt files aren’t complicated to create.You can write them using a text editor such as notepad with just a few simplecommands. But they can still be scary or hard for some site owners tocontemplate. To access the tool, log-in to yourGoogle Webmaster Toolsaccount, then click on the Tools menu option on the left-hand side of the screenafter you select one of your verified sites. You’ll see a "Generate robots.txt"link among the tool options. That’s what you want. By default, the tool is … [Read more...] about Google Offers Robots.txt Generator
After a year of discussions, ACAP — Automated Content Access Protocol — was released today as a sort ofrobots.txt 2.0 system for telling search engines what they can or can’t includein their listings. However, none of the major search engines support ACAP, andits future remains firmly one of "watch and see." Below, more about the how andwhy of ACAP. Let’s start with some history. ACAPgot going in September 2006, backed by major European newspaper andpublishing groups that in particular felt Google was using content withoutproper permissions and wanting a more flexible means to provide this thanallowed by the long-standing robots.txt and meta robots standards. These two standards are found at the robotstxt.org, and ACAP has been referring to them often at "RobotsExclusion Protocol" or REP, though within the SEO world, they’re generally knownby their actual names. Robots.txt was born in 1994 as a way to block content on a server-wide basis;meta robots … [Read more...] about ACAP Launches, Robots.txt 2.0 For Blocking Search Engines?
The Robots.txt Summit at Search Engine Strategies New York 2007 was the latest in a series of special sessions with the intent to open a dialog between search engines representatives and web site publishers. Past summits featured discussion on comment spam on blogs, indexing issues and redirects. The subject of this latest summit was to discuss the humble but terribly important robots.txt file. Danny Sullivan moderated, with panelists Keith Hogan, Director of Program Management, Search Technology, Ask.com, Sean Suchter, Director of Yahoo Search Technology, Yahoo Search, Dan Crow, Product Manager, Google and Eytan Seidman, Senior Program Manager Lead, Live Search. The Robots.txt summit session was not on how to use the robots.txt file, rather as Danny Sullivan explained, “We’re assuming you know how to use it and are frustrated with it. This is about how you want to see it evolve.” For a potentially dry and technical subject, the panel turned out to be quite … [Read more...] about Up Close & Personal With Robots.txt