#Nothing interesting to see here, but there is a dance party #happening over here: http://www.youtube.com/watch?v=9vwZ5FQEUFg User-agent: * Disallow: /api/user?* Disallow: Sitemap: http://www.seomoz.org/blog-sitemap.xml Sitemap: http://www.seomoz.org/ugc-sitemap.xml Sitemap: http://www.seomoz.org/profiles-sitemap.xml Sitemap: http://app.wistia.com/sitemaps/2.xml If you are unfamiliar with robots.txt, be sure to read these pages: … [Read more...] about Have You Considered Privacy Issues When Using Robots.txt & The Robots Meta Tag?
Robots txt sitemap xml
Overall, there are some ideas in ACAP that would be useful for the search engines to consider. However, there are many ideas outside of ACAP that would also be useful for them to consider. There’s nothing I see within ACAP that provides some type of crucial control that if only news publishers had, all their online woes would be over. What the news publishers really want are licensing agreements, and given that Google already has several of these without using ACAP (see Josh Cohen Of Google News On Paywalls, Partnerships & Working With Publishers), I can’t see that having it somehow advances any business model changes. … [Read more...] about ACAP Versus Robots.txt For Controlling Search Engines
The Robots Exclusion Protocol (REP) is not exactly a complicated protocol and its uses are fairly limited, and thus it’s usually given short shrift by SEOs. Yet there’s a lot more to it than you might think. Robots.txt has been with us for over 14 years, but how many of us knew that in addition to the disallow directive there’s a noindex directive that Googlebot obeys? That noindexed pages don’t end up in the index but disallowed pages do, and the latter can show up in the search results (albeit with less information since the spiders can’t see the page content)? That disallowed pages still accumulate PageRank? That robots.txt can accept a limited form of pattern matching? That, because of that last feature, you can selectively disallow not just directories but also particular filetypes (well, file extensions to be more exact)? That a robots.txt disallowed page can’t be accessed by the spiders, so they can’t read and obey a meta robots tag … [Read more...] about A Deeper Look At Robots.txt
Why are the search engines coming together to talk about their varied support for traditional methods for blocking access to web content? A Microsoft spokesperson told me that while robots.txt has been the de facto standard for some time, the search engines had never come together to detail how they support it and said the aim is to “make REP more intuitive and friendly to even more publishers on the web.” Google similarly said that “doing a joint post allows webmasters to see how we all honor REP directives, the majority of which are identical, but we also call out those that are not used by all of us.” … [Read more...] about Yahoo!, Google, Microsoft Clarify Robots.txt Support
Danny Sullivan moderated, with panelists Keith Hogan, Director of Program Management, Search Technology, Ask.com, Sean Suchter, Director of Yahoo Search Technology, Yahoo Search, Dan Crow, Product Manager, Google and Eytan Seidman, Senior Program Manager Lead, Live Search. The Robots.txt summit session was not on how to use the robots.txt file, rather as Danny Sullivan explained, “We’re assuming you know how to use it and are frustrated with it. This is about how you want to see it evolve.” … [Read more...] about Up Close & Personal With Robots.txt