Google’s Cloaking Policies

Google Blogoscoped has a post today titled Does Google Allow Cloaking When They Like the Site?. There is not really any doubt in my mind that the answer to this question is yes.

I have known for years that CD Universe does cloaking. You can see it quite simply. Search on “metallica CDs” in Google, and click through to the page listed for CD Universe (shows as #3 here in the results I see). Then look at the source for the page, select it all, and copy it into a text editor that provides line numbers. You will see that this file has 294 lines of code in it.

Now go back to your Google search and click on the “cached” link for the same page. View the source, select it all, and copy it into the same text editor. You will see that the page has 34 lines of source in it.

This is cloaking in action. The 34 line version of the page presents the unique text and links of the page very prominently in the file. I suspect that this has worked for them very well, and they have not been banned at any time that I know of.

So to the question as to whether or not this is fair. After all, it’s pretty easy to think of scenarios which represent “good cloaking”. Here are two examples:

  • Your site uses session IDs to track users, and you simply want to feed the bot the URLs without the session IDs.
  • Your site uses lots of Flash and/or Javascript, and you want to give the bot something easier to chew on, but continue to present the same actual content.

In both these cases, you are not trying to deceive anyone. You are just trying to address basic problems, in a reasonably simple way.

I have a client who wanted to resolve the session ID problem by cloaking. I had a dialogue with Google engineers about this, and the message back was don’t do it. The tone of the message suggested that the reason for not doing it was because it was not safe to do it.

While I don’t work at Google, and don’t have any particular inside information, I am very confident that this is what’s going on.

  1. Google does do various things to detect cloaking
  2. When they detect cloaking, they will, in fact, make some effort to detect “good” v.s. “bad” cloaking.
  3. However, none of the techniques they use are deterministic, nor do they want to accept the obligation to make them deterministic
  4. Google will not make any formal communication about this, because the suggestion that some cloaking is OK will cause an outcry for a well defined policy, that includes “fairness”
  5. As a result, the policy is “don’t cloak”

Personally, I am OK with all of this. It’s a “cloaker beware” policy. Sure, if your implementation is clean, you may not end up being punished. But our advice remains the same: Don’t cloak. It’s not worth the risk.

Leave a Reply

Your email address will not be published. Required fields are marked *