The NoIndex Metatag Debate

Matt Cutts has just put out a posting on how search engines handle the “noindex” metatag. He makes some interesting observations about the results as follows:

  • Google doesn’t show the page in any way
  • Ask doesn’t show the page in any way
  • MSN shows a url reference and Cached link, but no snippet. Clicking the cached link doesn’t return anything.
  • Yahoo! shows a url reference and Cached link, but no snippet. Clicking on the cached link returns the cached page.

Matt goes on to note that it would be great if all the search engines treated the noindex metatag, well, like a noindex request (translation, “DON’T index it” – my words, not Matt’s). I think that this is an excellent idea.

Danny Sullivan, still of Search Engine Watch, adds his thoughts to the noindex discussion. Danny observes that Google treats the robots.txt file in a way that many webmasters may not expect. In particular, Google may still index pages that are marked as “don’t crawl” in the robots.txt file, may still be indexed by Google.

The one case I have seen where this happens is when pages are marked in robots.txt as “disallow”, yet third party pages link to those pages. In the past, Google has reserved the right to still index these pages.

In fact, Google may still index a page even if it is marked with the noindex metatag. This can happen if it is also listed in the robots.txt file as “disallow”. The reason is that Google treats the robots.txt as being a higher priority than the noindex metatag, and ignores the metatag if the page is also called out in the robots.txt file.

Now it turns out that this is pretty simple to address. All you need to do is not use the robots.txt file, and rely solely on the noindex metatag. In this event, Google will reliably not index the page, in all cases (that I know of).

Danny repeats a call I have seen him make in the past for imporved standardization of search engine behavior. More standardization by the search engines sounds like another excellent idea. We all recognize that there are things that the search engines need to keep close to the vest, and treat as proprietary. But making life easier on SEOs and search engines by following agreed upon standards will benefit us all.

You can read more about metatags and SEO here.

The Cost of Duplicate Content

Let’s talk about the cost of duplicate content. At first blush, it seems like a relatively minor issue. In principle, a search engine wants to include only one copy of a page in its index. So if you have multiple pages with the same content, the search engine picks only one. This means one copy of your content is ignored.

So far it does not sound too bad, does it? However, there are other less obvious consequences to duplicate content. For example, it can’t be good that crawlers some to your site and crawl pages that they will never index. In fact, it’s our understanding that crawlers come to your site with the goal of crawling a certain number of pages. So if they crawl pages that will not be indexed, they are not crawling pages that will. This could result in fewer pages of your site getting indexed.

In addition, there are tons of ways to end up with unintentional duplicate content. Here are just a few:

  1. Running an affiliate program
  2. Syndicating content
  3. Failure to 301 redirect from the non-www version of your site to the www version of your site (or vice versa)
  4. Code implementations that cause sub-domain pages to automatically mirror to your site
  5. Code implementations that lead to different URL paths to render the same content
  6. Pages with “different content”, but that are not different enough – this can happen with database driven sites

I am sure that there are many more ways to create duplicate content. Each of these scenarios has its own issues and problems. But one problem with nearly all of them is link dilution. Your site has a certain amount of page rank to spread around. Links that go to pages that will never be indexed are wasted. This means that less page rank is poured into those pages that are indexed. This will likely result in lower rankings for those pages.

So the bottom line is potentially fewer pages indexed and lower rankings for the pages that are indexed. This sounds like an extremely high cost to me. You can read more about problems with, and solutions for duplicate content here.

Danny Sullivan’s Contributions to SEO

Leave it to Danny Sullivan. While the SEO world is abuzz with the news of his leaving Search Engine Watch (SEW), and Search Engine Strategies (SES) as well, Danny focuses on the issue of losing Pluto as a planet. As someone who has been at 6 or so SES shows, I can tell you that this is typical of Danny’s character.

SEW and SES may have led the way in popularizing SEO, but Danny played a key role in humanizing it. He has brought humor, humility, and dignity to our profession, and I know I am not alone in celebrating his contributions. We should all join together in a hearty thanks to Danny for what he has accomplished.

I remember one SES where he was chairing a “Meet the Search Engines Panel”. While the presence of folks like Matt Cutts, Rahul Lahiri, and Tim Mayer, Eytan Seidman is already good stuff, Danny added so much to the session. For example, his challenges to the search engines to get together and agree on new standards such as the “nofollow” tag, the “robots” tags, and so forth.

The search engines already had the idea that discussion of common issues made sense. But Danny put forth the idea that the webmaster and SEO community could place demands on the search engines. The idea being that we could say “We need you to …”. SEW and SES became major vehicles for this type of communication between webmasters and search engines. The communication channel got opened and the rest is history unfolding.

Danny is by no means done, unless he chooses to be. The SEO world has recognized his leadership, as every single major blog I know of has written about this event. Now we all need to wait and see what he does next. We will all be paying close attention. I wish you the best of luck Danny, but I don’t think you will need it.

Technorati Tags: Danny Sullivan

Social Media Optimization

Rohit Bhargava of Influential Marketing started an excellent thread of blog posts with his post titled: 5 Rules of Social Media Optimization. In it, he starts us off with a list of rules for engaging the social web community and building traffic in the process. This initial post has since been enhanced by others, resulting in a list of 16 rules so far. The additional contributors include:

  1. Jeremiah Owyang contributed Rules 6 and 7.
  2. Cameron Olthuis contributed Rules 8 through 11.
  3. Loren Baker contributed Rules 12 and 13.
  4. Lee Odden contributed Rules 14 through 16.
  5. Jean-Marie Le Ray has translated the initial 16 rules into French.

As you will see, Rohit Bhargava has done a nice job of following his own rules. He has updated his post on a regular basis to add links to those who have added new rules. This speaks to the recommendations of several of the people above of engaging in a dialogue and communicating. To that end, I would like to propose a new rule, Rule 17.

17. Engage your peers: Find out who your peers are. Specifically seek them out and engage them in discussions. Read their writings regularly. Add comments to their posts. Send them emails. Ask for their opinions. In the world of the social web, people are looking for the value added dialogue.

This thread of blog posts is an excellent example of people of like minds looking to build out a concept to its fullest. It’s this type of discussion that helps a concept achieve its most complete definition.

In addition, I would like to propose a modification to Rule 11, as proposed by Cameron Olthuis. The original rule 11 was “Be real – The community does not reward fakers”. I would simple change it to “Be Real / Earn your readers trust – The community does not reward fakers”. The modification is simply intended to broaden the task a bit, as there are many things that you can do to earn trust from your readers, such as being accurate, providing timely info and insight, etc.

That said, let’s look at the total list, including the new proposed rule 17, and the proposed modification to Rule 11.

  1. Increase your linkability
  2. Make tagging and bookmarking easy
  3. Reward inbound links
  4. Help your content travel
  5. Encourage the mashup
  6. Be a User Resource, even if it doesn’t help you
  7. Reward helpful and valuable users
  8. Participate
  9. Know how to target your audience
  10. Create content
  11. Be real / Earn your readers trust
  12. Don’t forget your roots, be humble
  13. Don’t be afraid to try new things, stay fresh
  14. Develop a SMO strategy
  15. Choose your SMO tactics wisely.
  16. Make SMO part of your process and best practices.
  17. Engage your peers.

Directory Links

Recently I came across a post about some of the best Directories at the AvivaDirectory web site. This posting lists directories as organized by “Page Strength”, using a tool developed for this purpose by SEOMoz. The posting also includes information on the cost for submission to the directory.

Anyone familiar with STC knows that we recommend that you do not buy links. So are we recommending that you pay to submit to directories? Yes, we are. Why? Because there is a big difference. Getting your site listed in a reputable directory helps the seaarch engine, provided that the following criteria are met:

  1. The directory uses real human editors who actually look at your site before placement in the directory.
  2. The directory makes you pay in advance, yet reserve the right to reject your submission and keep the cash. This is a big key, because it acts as a spam filter.
  3. The directory retains the write to change the title and description you submitted for your site.
  4. The editors at the directory site also have the right to change the directory page on which you are listed from the one you suggested.

Provided all these criteria are met, you basically have an independent, mostly objective, evaluation of your site by human eyes. This is great info for a crawler in understanding what your site is about. The fact that they can keep the cash and not give you a listing is a strong disincentive to spammers, who generally speaking, don’t apply.

In the listing on the avivadirectory site, Yahoo, DMOZ, and Business.com are appropriately listed as the top 3. These provide the best results. Note that getting into DMOZ maybe free, but it’s a crap shoot. Apply to the right category, and then forget it. You get in or you don’t. The editorial help is all volunteer, and there are large gaps in available bandwidth to review your listing.

I would not spend any time looking at any of the directories that have a low page strength score (below 5), unless they have a high level of focus on the topic area of your web site. This is a good topic all by itself. If the directory does not have many listings on the topic of your site, you might want to decline to submit your site to them.

Such a listing does not bring much relevance value to your site. In addition, while I have no information on how search engines react to these types of links, it starts to look like you are just trying to buy page rank at that point (not a good thing).

So approach the big 3 (Yahoo, DMOZ, and Business.com), and then select judiciously from the rest those that fit your site from a relevance perspective.

Buying Older Sites and Domains

Greg Boser of Web Guerilla made an interesting post about buying older sites and domains. In his post he shows an example of a domain amishfurniturecrafts.com that resolves to the same IP address as gokartsusa.com.

The result is that the site ranks highly for keywords like “Amish furniture”, “Amish gokarts”, and “mini bike furniture”. As a side effect, this can also overtly hurt the rankings of the phrases that the site originally ranked for. Why? While we don’t have the data to prove it (we did not look at this site prior to seeing Greg’s post), inbound anchor text, and the pages and sites linking to you, play a key role in developing the “relevance” picture of your site. When the messages get mixed, so do the results.

Presumably, the Amish furniture domain was acquired by GoKartsUSA in order to gather the value of the inbound links to their site. Greg points out that there is a possibility that this could be an act of competitive sabotage. We consider this scenario a bit unlikely, though. Its an awful lot of work for someone to go through with uncertain results. Not only that, one would suspect that the majority of GoKartsUSA competitors would not have the SEO expertise to figure this one out.

It looks to me like GoKartsUSA probably got some bad SEO advice. 1-2 years ago, relevance calculations were not as sophisticated as they are today, and a link was a link. What we do get is a splendid example of the power of anchor text. The inbound anchor text is literally overriding the on page content.

So content may be king (you can’t get good links without it), but links rule. Do your link building with care, and keep it on topic.