GoogleBot Crawling CSS Files

Barry Schwartz spotted a thread in the Cre8site Forums discussing a report by Ekstreme.com that Googlebot requested a CSS file. This is a very interesting, but not surprising development.

It’s been known for some time that you can use CSS to hide text. It was also clearly stated by search engine representatives at SES in Chicago that it’s a no-no to block crawling of your CSS or Javascript include files.

The real question now becomes how pervasive this is going to become. Is Google going to start crawling everyone’s CSS? Or are they simply going to trigger off of manual or algorithmic flags to do this for some sites? Or for that matter, are they going to start doing some random crawls of CSS on some sites as a spot check?

In this post by Barry he quotes “pageoneresults” in a WebmasterWorld forum:

Google has a hard enough time now dealing with html/xhtml. Parsing CSS files and determining whether something is hidden or not is not a solution. Now the bot would need to determine why that CSS exists. There are many valid uses of display:none or display:hidden.For those who may be hiding things through CSS or negatively positioning content off screen to manipulate page content, I surely wouldn’t do that with any long term projects. ;)

The penalty for getting busted using this technique I would imagine is a permanent ban. No if’s, and’s, or but’s, you’re history. You’ll need a pardon from the Governor to be reconsidered for inclusion. ;)

It may indeed be hard to algorithmically determine an illicit use of display:none or display:hidden, v.s. a legitimate one, but it certainly can be used as a flag.

We need to remember that this all operates in the context of “trust”, a topic that Matt McGee does an excellent job of discussing in his recent post on that topic. There are many trust flags that search engines look for. Usually, no single bad thing is going to lead to your being penalized (unless, it’s REALLY bad). But many things can be used as flags:

  1. Buying links
  2. Too many reciprocal links
  3. Cloaking
  4. Abnormal changes in rates for adding links
  5. No trusted links
  6. NoCrawling your CSS files
  7. Using display:none or display:hidden in your CSS file

All of these things can be used as triggers. Amass too many flags, and a site becomes worth a review. And, of course, some of these things trigger algorithmic penalties (e.g. too high a percentage of reciprocal links).

Of course, Google can take a simpler approach, based on creating FUD. Read in a bunch of CSS files here and there and ignore them. Gets us all talking about it wondering what they are doing doesn’t it? Be that as it may, I would take pageoneresults statement above to heart, when talking about hiding things with CSS: “I surely wouldn’t do that with any long term projects”.

Comments

  1. Anup says

    I think it would be very difficult (impossible, perhaps?) for Google to determine through algorithms *alone* that such CSS techniques are being used maliciously.

    There are very valid accessibility reasons (as well as other reasons such as implementing richer user interfaces) to position content off the screen (e.g. to serve as headings or labels or instructions to screen readers for example, while not being needed in the visual design), or to hide it.

    There is no way (I would think) that Google would therefore *automatically* ban such usage, without getting a real person to look at it first. Then, when they see such usages, this would not become a factor for a ban.

    At least that is what I hope ;)

  2. Eric Enge says

    Anup,

    I hope so too. But as I say above, they can use it as another flag to cause them to go take a closer look.

  3. Anup says

    Fair point.

    But, in a way, if it is only going to be a flag, and not automatic removal, then that is fine to me.

    Given it is a legitimate CSS technique (especially moving content off the screen for accessibility purposes — something I have been doing on various projects for a few years now, and something being discussed in some web standards blogs recently), then I am hopeful that it will not result in automatic removal.

    It sounds like you’d have to be doing a lot of bad things intentionally (or just be VERY unlucky) for Google to then consider getting someone to verify your site, to determine whether to black list you or not.

    Maybe I am being a bit naive as to how easy or difficult it is to get so many things wrong unintentionally…

  4. says

    What doesn’t make sense about crawling CSS for display:none’s is that the folk who really wanted to hide those CSS declarations could do so by writing the CSS directly to the browser with some javascript document.write’s.

Leave a Reply

Your email address will not be published. Required fields are marked *

*