In this study we set out to see just how well some of the world’s top e-commerce sites use SEO tags and robots.txt to manage their faceted navigation. The results I am reporting here today will show you how often these sites get it right, and in some cases, just how horribly wrong they get it.
I’ll give it to you straight out – it’s bad out there. Only 38% of the URLs we looked at across 20 highly prominent e-tail sites were using SEO tags in what we consider an optimal manner. Even worse, 23% of the URLs we looked at were using overtly conflicting tags.Study: Only 38% of 20 top eCommerce sites use SEO tags correctly.Click To Tweet
(Jump to the bottom of the page to watch a video summary of this study!)
What is Faceted Navigation?
What is faceted navigation? There are 3 types of navigation that are typically considered faceted navigation. These are:
- Sort orders: Examples are: showing products from highest to lowest price vs. showing them from lowest to highest price.
- Filtered navigation: Examples are: showing only products that are under $100, or showing only products that are red.
- Pagination: Example: when you have 100 products and show only 10 products a page, so they get split across 10 pages.
The reason we want to use SEO tags on these pages is that they can easily be seen by search engines as thin, poor quality, or duplicate content. The correct SEO tagging strategy will instruct search engines on how to view the various facets, and reduce the chances of that becoming a problem.
Best Use of SEO Tags
The optimal way to use these tags is quite simple, and is as follows:
Key to all of these tagging schemes is that no other tagging should be done. That means no NoIndex, NoFollow, or Disallowing in robots.txt. These will only confuse the situation and potentially break your SEO for these pages altogether. Use ONLY the tagging outlined in the above table.
Different e-commerce sites have different types of product attributes in need of faceted navigation. The elements we looked at included:
These are all pretty common types of attributes, and we looked for those categories on all 20 sites we included in this study. For some of the sites not all categories applied. For example, a site might not have had a “Size” filter.
The Gross and the Stupid
Two of the sites we looked at used rel=canonical on their sorts and filters, but pointed the rel=canonical to their home page. This is a violation of a basic concept of what a rel=canonical is supposed to do which is to point to a page which contains a substantial portion of the content on the page containing the canonical. Here is exactly what Google says about it: “A large portion of the duplicate page’s content should be present on the canonical version.”
The normal response of Google to seeing something like this is to ignore the canonical tag altogether, and it appears that Google has done so in the case of the two sites we saw with this problem.
The proper implementation is to point the canonical to a page that contains a superset of information on it. That is usually a nearby parent category page. You can learn more about how to implement a rel=canonical here.
And, just for fun, there is one other scenario we saw worth mentioning in this section. We saw one site that implemented rel=prev/next tags on pages that had no pagination!
Other Examples of Conflicting Tag Scenarios
Of course, implementing a rel=canonical to the home page of the site is not the only example of problems we saw. Some of the other scenarios we saw were:
Some of you may question why I saw the use of a canonical with a NoFollow as a problem. Put simply, the NoFollow provides search engines instructions on how to handle PageRank flow from the page, basically by telling it to block all PageRank flow out of the page. Yet, the rel=canonical says to pass all link value to the page that the canonical tag points to – hence the conflict.
Some of you may also be wondering what a “hashbang URL” is. Basically, it’s part of a method for allowing Google to crawl AJAX pages that was supported by Google for a number of years. This method is now officially been deprecated by Google, meaning that they no longer recommend it.
It’s likely that Google is deprecating this protocol because they don’t plan to index any content on a page that requires a user to click on something to expose it. You can read more about that here.
So let’s get a bit of a closer look at the details! Note that we won’t be outing anyone for their mistaken SEO practices here. With that settled, here is how it broke out when looking at what percentage of SEO tags were conflicting per what was outlined above:
Ouch! 23% of the time, the sites in question provide two or more conflicting instructions to search engines on the target web pages. What’s a poor search engine to do? From the publisher’s perspective, the reason this is bad is that it leaves it up to the search engine to figure it out. And, as a result, there is a chance that the search engine will get it wrong – and the whole point of this exercise is to make it easier for them in the first place.
Next up, let’s look at sub-optimal use of tags. In addition to the conflicting tag scenarios, this includes pages where publishers implemented no tags, a NoIndex instead of a rel=canonical, and similar situations:
Double Ouch! Optimal use of tagging was done on these sites only 38% of the time. And, this is on some of the top e-commerce sites on the planet. That’s pretty frightening.
Important footnote here – I did count implementing NoIndex on sort orders and filters as an optimal solution. It does solve the problem of pulling those pages out of the index. I still consider the rel=canonical the superior solution (as outlined in the above table on expected tagging behavior) because it passes all its PageRank back to the target page, whereas NoIndex simply passes PageRank through all the links on the page, and that’s quite inefficient.
The bottom line is that slightly less than 2/3 of the scenarios we examined were implemented incorrectly. Clearly people are confused about how these tags work, and how to use them. We saw similar things in a study we published about a year ago on the use of real author tags.
The reality is that figuring out how to use them is not hard (see the table above), but most developers appear to not be taking the time to figure out their proper use. Put another way, getting this right is simply not getting enough priority from these major e-commerce sites.
It’s very important that you do take the time to get this right. Google, and the other search engines, defined these things for a reason, and that is to make sure they handle your pages properly. By definition, that means there is the possibility that they will get it wrong. The last thing you want to do is to make it even harder on them by misusing the tags.