I’ve found this very annoying, myself: Google has indexed my RSS feeds, thus pulling some searchers into unexpected directions. I’m sure I don’t need to tell most people who concern themselves with SEO: when people end up somewhere they don’t want to be, they don’t look deeper and they don’t come back. They just leave. Joost de Valk of SEO Egghead proposes the following intriguing solution:
I won’t go in to why search engines seems to be indexing feeds — fact is: they do. The feed for my personal blog has PageRank 4 at the moment, which goes to show that Google even assigns some weight to it. Now think about it, wouldn’t it be cool if you had the equivalent of a noindex, follow (not nofollow) robots meta tag for RSS feeds? That way, the feed could be followed, search engines could spider and assign weight to the links within, yet it wouldn’t show the contents of your feed in the SERPs.
I bet I already know what you’re thinking: “Oh, yes, Tom. That’s just what we need right now, more complexity for RSS!” Well, I agree that RSS is a horribly disorganized pseudo-standard right now, and that writing for RSS is a pain. However, are you really worrying that much about your feeds? Wouldn’t it be far, far more productive to create a basic feed and leave the coding to someone like FeedBurner? I certainly do.
Regardless, if you really care about your readers ~ if you really care about your ranking, in other words ~ you should be concerned about what does and what does not get indexed from your site. Google putting up links guaranteed to drive users off your web page for good is not a good thing.
My question for him is: why would you even want your feed to be indexed at all? His solution for the indexing problem is to allow Google, MSN and others to crawl your feed, index and the links off of it, but then just not put out the link to the feed itself for public consumption. That’s predicated on the notion that you would actually want that information indexed, presumably a second time.
One of the big challenges, especially where blogs are concerned, is eliminating duplicate content: information that can be found on your site more than once in exactly the same format. For example, Google may index the root of your blog and grab an article once, the two or three categories to which that article is published (thus indexing the same article a second time), and then the actual article permalink as well.
Most of us understand this to be an invitation to a duplicate-content penalty, search engines having detected this duplicate content as an attempt to rank a page higher than it’s actually worth. It seems to me that allowing the feed to be crawled but not indexed is more complex than it’s worth.
Far better would be to avoid indexing this feed altogether. Just leave the feeds alone and stick with the content pages. The only good reason I could think of for bots to search the feeds might be to catalogue the available feeds on the Internet, but as I understand it, no such service exists at Google, Yahoo! or MSN, exactly.
powered by performancing firefox