Building Multi-Million $ Web Sites from Scratch (Part 4 of …)

Content Development

Finally we get back to our series on building multi-million dollar web sites from scratch. We will continue to weave in regular postings about the Google Custom Search Engine developments, but we will begin to write about this topic regularly as well. Basically, this series is a high level blueprint about how to pursue high end results. For that reason we need a high end strategy. See below for a list of the prior posts on this topic.

This post is going to talk about content development. Let’s start talking about the roles it plays in the process. You are going to need lots of it if you are planning to build a mutli-million dollar web site. If you ability to generate reams of content is limited, you may well have to settle for a web site that generates a bit less in revenue, but the underlying principles still apply. Here are the major benefits of having significant quantities of quality site content:

  1. Listed number 1 for a reason: Makes your site more valuable to the visitors you receive
  2. Helps focus the spiders on keywords for each page
  3. Provides fodder for ranking on long tail terms
  4. Makes your site more attractive for other sites to link to

So now that we know some of the ways that content is important, how do we get it? This article will discuss a few ideas for how to do this. We use a few methods for getting content, so let’s look at the four most important ones:

1. Government data is wonderful. There are reams and reams of it, on almost any topic imaginable. While remembering that you tax dollars pay for this may be stressful, it’s a wonderful thing from a web site development perspective. Much of this data is accessible to you to use, free of charge. Please check any government web site you plan to use data from for their particular policies on reuse of their content.

One major thing you need to be concerned with when you use government data is that is leaves you with 4 problems to solve:

  1. Obtaining it in a form that you can process
  2. Processing it so that you have it in your own database (or equivalent)
  3. Presenting it in manner so that it becomes unique content
  4. Rendering it onto web pages in a reasonably attractive and useful manner

Solving problems 1, 2, and 4 requires that you have programming capability. We use Perl to do our work here. It’s a very powerful string processing language, and you can develop programs that solve problems 1, 2, and 4 pretty easily. This is a critical issue, and we will talk about this more in our next post in this series.

As for the third issue, if you take the government data and throw it up an a web site, you will join a list of others who have already done it – and they probable did it years ago and have a head start on you. Not likely to work. So you need to do something different. But the good news is that there is a lot of head room here.

Try analyzing the data. Tons of people do this, including very large companies such as the Brookings Institution do a tremendous amount of work that leverages public government data (while combining it with other research). But you can produce new, unique data, by analyzing the reams and reams of publicly available data.

2. Another thing you can do it license the content. There is a lot of content out there that has been assembled by others that is available for purchase at very low cost. In some cases, this is in fact government data that has been processed. You can find lots of this stuff, and it can give you real content for your site. The great stuff about this content is that since it is not as readily available to the public, and you may not need to change it so much to publish it (because it may already be unique)

3. Look to other publicly available data that is available for reuse. There are libraries of this content that are available via free public license models (such as Wikipedia, which is available under the GNU Free Documentation License). These usually require that you allow people to take and create derivatives of your version of the content. And, of course, before you publish it, you will need to change it yourself before you publish it.

4. Write it! People love dealing with subject matter experts. Write endlessly on a topic matter you know a lot about (make sure you are focusing on content quality). Since this is inherently a low volume activity (you can only write so fast, after all), see if you can supplement your writing with contract resources who you can get to write on related topics.

If you do use contract writing resources, make sure you know who they are, and have complete quality control over what they do. We do not use third party article services to write for us. We use people we know – people whose kids go to school with my kids for example. You can get cheap resources this way and make sure the article quality is up to our high standards. Even with several people on staff, article writing is best used as a supplement to your other content generation ideas.


So there are a few quick ideas for finding content. Three of the ideas hinge on having the ability to locate large sources of content, and then produce something unique out of it, and the other idea requires a lot of manual work. If you can’t get there to process this type of data on a large scale, then think a bit smaller in terms of your financial goals. But the same type of approach will still work on a smaller scale, just with smaller results.

Next up

  1. Code architecture
  2. How to get links
  3. How to monitor results, and what to do about it

Already Published Articles in the Series

  1. Picking a Market and Content Strategy
  2. Using PPC to Enhance your Organic Traffic Strategy)
  3. Site Hierarchy and Keyword Selection tags: , , ,


  1. Tom K. says

    Pretty good article. I felt motivated from the content. A couple of demos would have been great. I would be blown away if you would send me some simple, basic, and easy to understand perl code for the following:
    “Solving problems 1, 2, and 4 requires that you have programming capability. We use Perl to do our work here. It’s a very powerful string processing language, and you can develop programs that solve problems 1, 2, and 4 pretty easily.”

    Either way I appreciated the article.

  2. stonecold says

    Hi Tom,

    Unfortunately, the Perl required here is very dependent on the actual data you want to work with, and the way you want to render it. It really does break down into 3 different problems:

    1. Capturing it (i.e, downloading it from the government database).

    2. Moving it into a format you can easily process.

    3. Rendering the content onto web pages.

    Your task will be far easier if you treat these as separate steps.

Leave a Reply

Your email address will not be published. Required fields are marked *