Drupal and Search Engine Optimization

Drupal is known for being a very SEO friendly content management system (CMS). The way it assembles its pages is crawler friendly. This makes it a popular choice for people looking to build dynamic web sites. However, there are a number of potential SEO problems with Drupal as well. These need to be dealt with to ensure that you get optimal results.

The very fact that Drupal is such a dynamic system is a factor that leads to some of its SEO problems. The content is stored in a database and retrieved at runtime. Almost all information is stored as a “node”, a basic, unstructured unit of content. Often, each “node” is associated with groups of keywords, known as “taxonomies”, and Drupal makes it easy to retrieve and sort information by these taxonomies. Since all content can be retrieved dynamically, Drupal generates generic URLs for the content, such as www.example.com/?q=node/3 or www.example.com/node/3.

These “internal” URLs are always present in Drupal, even though Drupal provides features that allow you to hide them, and instead present much friendlier URLs, known as aliases, to web site users. There are multiple optional modules that may affect the generation of pages and the naming of URLs, and there are many modules that remain aware of the internal naming conventions, even when user-friendly URLs are being used. As a result Drupal may expose both the internal URLs and the user-friendly URLs to users and web crawlers.

As a result of these kinds of architectural issues, many Drupal sites end up exposing content to the web via multiple URLs. When this happens, the multiple URLs can be crawled by the search engines, creating duplicate content problems. Here are some examples of duplicate content issues, and some other problems that can arise in drupal.

1.Problem: duplicate content from aliases

Example: www.example.com/node/5 and www.example.com/content/how-to-surf, both pointing at the same physical document.

Solution: use robots.txt to disallow URLs that include “/node/” For example, you can include the following lines in robots.txt:Disallow: /node

Disallow: /*/node/Considerations: Note that this assumes that all URLs are available via friendly aliases. This should be the case if you’re using the pathauto module.>[?

2. Problem: Drupal’s default robots.txt has errors.

Example: the default robots.txt uses “Disallow: /search”. This disallows only a page ending with /search, but not all of the Drupal internal search results pages, which is desired.

Solution: update the robots.txt to read:Disallow: /search/

3. Problem: Pathauto can create many extra pages on the site if configured incorrectly.

Example: If you turn on “Create index aliases”, and you have a hiearchical alias (e.g., a page with a path containing a slash, such as music/concert/beethoven) Drupal automatically generates index pages that contain all pages in each category — for example all music, and all concerts.

Solution : Do not check the “Create index alias” check box in the Pathauto module.

4. Problem: Incorrect setting of the Pathauto “Update action”, in a production environment, can cause URLs of published pages, which may already be indexed by the search engines, to change.

Solution: In development mode (before exposing the site to the search engines), use “Create a new alias, replacing the old one” to regenerate URLs whenever necessary (for example, if your Pathauto rules change). In production, once the site is exposed, set this to “Do nothing, leaving the old alias intact”.

5. Problem: Some modules, such as Forums and Views, create sortable lists that can generate multiple URLs with duplicate content.

Solution: If you use such a module, be sure to exclude the sorted variations using the following robots.txt rule:Disallow: /*sort=

6. Problem: The Forward module creates a link to a URL, on each page, that allows the page to be forwarded to a friend. You can easily end up with hundreds or thousands of such low quality pages that are essentially boilerplates.

Solution: If you use this module, be sure to exclude the forward pages using the following robots.txt rule:

Disallow: /forward/

These problems can crop up on many Drupal systems, and all Drupal users should review their sites for these issues. Drupal may also have other issues, depending on the site and the degree of customization. For example, on several sites, we’ve seen Drupal generate complex CSS hierarchies that end up building hidden text into the pages. While search engines try to detect hidden text scenarios that are not a result of bad intent, this is a risk you don’t need. As long as you recognize what the issues are, they can be dealt with, and Drupal can be a great choice as a content management system. Most content management systems present even greater challenges to SEO.


  1. says

    Great post. Drupal takes a lot of work that’s for sure. Joomla is even worse. And I hate putting too many resources into a CMS and being dependent on versions that become outdated (WordPress 2.3 is breaking plugins everywhere). But I still use Drupal (and Joomla, but lots less now) for some projects.

  2. says

    I’d also add another item similar to #6. If you use the Printer-Friendly Pages module to give users printable versions of your site, sometimes Google will index the printable ones higher than the actual page with ads, thus reducing your revenue. It was a big problem with the module for about a year and a half as the module made URLs like node/5/print, which you can’t really exclude using robots.txt. There was some discussion about this here.

    It seems as though the module owner has since fixed the problem and the URLs are now in the form of print/node/5, but you’ll still want to probably exclude these in your robots.txt using the following:

    Disallow: /print/

  3. says

    Good article.

    I read through your solutions and wanted to make you aware of the Global Redirect module which handles almost all of the duplicate content issues by preventing the trailing slash problem. In addition, if you handle the canonical domain name issues in .htaccess and you set your preferred domain in Google Webmaster Central, I think you will solve most of Drupal’s SEO problems that it has out of the box.

  4. says

    I have used Drupal CMS and I find it very search engine friendly as I can change the links to static. Thanks for this post I guess I have to review my durpal pages for search engines optimization purposes…..:)

  5. says

    Thanks a lot, I have read and implemented many robots.txt optimization ideas, but I found new things in yours.
    And I really appreciate that you are using an accessible “CAPTCHA”!

  6. says

    Mod-x also seems to be very seo friendly. allot of these tools are making it much easier to work on / seo your site. and most include seo friendly urls which make a big differene

  7. says

    I am using the Joomla CMS for the past two years, I can say Drupal has more search engine friendly options than any other CMS. Search engine friendly URLs, meta keywords and meta descriptions on every page and lots of similar stuff definitely adds up to your SEF factor. Besides, joomla, Drupal can be easily adapted and used.

  8. says

    I’d be curious if anyone has done an analysis on which CMS platforms are better for SEO, Drupal, WordPress or Joomla. I’m sure they each have advantages and disadvantages but am curious if anyone has done an in depth analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *