Building Multi-Million $ Web Sites from Scratch (Part 4 of …)

Content Development

Finally we get back to our series on building multi-million dollar web sites from scratch. We will continue to weave in regular postings about the Google Custom Search Engine developments, but we will begin to write about this topic regularly as well. Basically, this series is a high level blueprint about how to pursue high end results. For that reason we need a high end strategy. See below for a list of the prior posts on this topic.

This post is going to talk about content development. Let’s start talking about the roles it plays in the process. You are going to need lots of it if you are planning to build a mutli-million dollar web site. If you ability to generate reams of content is limited, you may well have to settle for a web site that generates a bit less in revenue, but the underlying principles still apply. Here are the major benefits of having significant quantities of quality site content:

  1. Listed number 1 for a reason: Makes your site more valuable to the visitors you receive
  2. Helps focus the spiders on keywords for each page
  3. Provides fodder for ranking on long tail terms
  4. Makes your site more attractive for other sites to link to

So now that we know some of the ways that content is important, how do we get it? This article will discuss a few ideas for how to do this. We use a few methods for getting content, so let’s look at the four most important ones:

1. Government data is wonderful. There are reams and reams of it, on almost any topic imaginable. While remembering that you tax dollars pay for this may be stressful, it’s a wonderful thing from a web site development perspective. Much of this data is accessible to you to use, free of charge. Please check any government web site you plan to use data from for their particular policies on reuse of their content.

One major thing you need to be concerned with when you use government data is that is leaves you with 4 problems to solve:

  1. Obtaining it in a form that you can process
  2. Processing it so that you have it in your own database (or equivalent)
  3. Presenting it in manner so that it becomes unique content
  4. Rendering it onto web pages in a reasonably attractive and useful manner

Solving problems 1, 2, and 4 requires that you have programming capability. We use Perl to do our work here. It’s a very powerful string processing language, and you can develop programs that solve problems 1, 2, and 4 pretty easily. This is a critical issue, and we will talk about this more in our next post in this series.

As for the third issue, if you take the government data and throw it up an a web site, you will join a list of others who have already done it – and they probable did it years ago and have a head start on you. Not likely to work. So you need to do something different. But the good news is that there is a lot of head room here.

Try analyzing the data. Tons of people do this, including very large companies such as the Brookings Institution do a tremendous amount of work that leverages public government data (while combining it with other research). But you can produce new, unique data, by analyzing the reams and reams of publicly available data.

2. Another thing you can do it license the content. There is a lot of content out there that has been assembled by others that is available for purchase at very low cost. In some cases, this is in fact government data that has been processed. You can find lots of this stuff, and it can give you real content for your site. The great stuff about this content is that since it is not as readily available to the public, and you may not need to change it so much to publish it (because it may already be unique)

3. Look to other publicly available data that is available for reuse. There are libraries of this content that are available via free public license models (such as Wikipedia, which is available under the GNU Free Documentation License). These usually require that you allow people to take and create derivatives of your version of the content. And, of course, before you publish it, you will need to change it yourself before you publish it.

4. Write it! People love dealing with subject matter experts. Write endlessly on a topic matter you know a lot about (make sure you are focusing on content quality). Since this is inherently a low volume activity (you can only write so fast, after all), see if you can supplement your writing with contract resources who you can get to write on related topics.

If you do use contract writing resources, make sure you know who they are, and have complete quality control over what they do. We do not use third party article services to write for us. We use people we know – people whose kids go to school with my kids for example. You can get cheap resources this way and make sure the article quality is up to our high standards. Even with several people on staff, article writing is best used as a supplement to your other content generation ideas.

Summary

So there are a few quick ideas for finding content. Three of the ideas hinge on having the ability to locate large sources of content, and then produce something unique out of it, and the other idea requires a lot of manual work. If you can’t get there to process this type of data on a large scale, then think a bit smaller in terms of your financial goals. But the same type of approach will still work on a smaller scale, just with smaller results.

Next up

  1. Code architecture
  2. How to get links
  3. How to monitor results, and what to do about it

Already Published Articles in the Series

  1. Picking a Market and Content Strategy
  2. Using PPC to Enhance your Organic Traffic Strategy)
  3. Site Hierarchy and Keyword Selection

del.icio.us tags: , , ,

One Basic Problem with Algorithmic Search

A short while back, we had the opportunity to interview Google’s Shashi Seth. This interview started with a fascinating look at basic flaws in algorithmic search. In fact, it is these limitations that has led Google to implement the Google Co-Op program.Basically, this boils down to two major issues:

  1. Most user queries to not fully explain their context
  2. Even if the user queries do explain their context, most web pages do not present data indicating what context they intend to address

Since many of you are probably going “Huh?”, let me explain with an example. If a user searches on “diabetes”, the search engine has a few possibilities to deal with:

  1. You are looking for treatment information
  2. You are a doctor looking for research information
  3. You are a drug designer at a pharmaceutical company looking for drug trial data
  4. You are a medical authority looking for related regulations

What makes the problem worse is that even if you type in a more specific query, such as “diabetes information for patients”, it’s hard for the search engines have a hard time using this context data to find the best authoritative resource.

One of the major ways that search engines deal with this is by deliberately offering up a diverse set of results (if we don’t know if you are a doctor or a patient, let’s make sure both types of results are available high on the first page …). It’s a workable solution for now, but Google is looking to improve on this.

There first effort was launched in May of 2006. Google invested heavily in their Topics and Subscribed Links programs. A simple search shows how Google Topics actively tries to address the concerns with search expressed above:

Coop Topics Picture

The links between the sponsored links and the first search results are Google Topics in action. You can see how the Topics provided includes specific contexts that will, in theory, make searching easier for the user. However, the program was not a success, because it relied on human editors to guide the output of the context filters, and the motivation for the human editors was unclear.

Thus was born the Custom Search Engine program. This program is still designed to solve the same problem. Google is looking for people that will design vertically oriented search engines for specific contexts.

The big difference is the AdSense revenue sharing. You can get paid for your work. While it remains unclear how much you will own, we now have a much more promising value proposition for tweaking search results for different contexts.

You can, of course, ask the question as to whether or not the concept works. So let’s look at an example, by comparing the results of a search engine designed to provide medical information to patients, and a search engine designed to provide medical information for doctors.:

CSE Health Patient Diabetes CSE Health Doctor Diabetes

You can see how different the results are. The one on the left presents data targeted at patients, and the one on the right presents data targeted at doctors. This is the power of human editing, addressing the basic contextual problem of search. If you like, try the patient and doctor Custom Search Engines yourself to see how they work. We don’t claim that they offer perfect results (yet), but they do illustrate the concept.

Google Custom Search Engines – Who will be the Winners?

One of the more interesting things to consider in light of the Google Custom Search Engine (CSE) announcement is who will be the winners, and who will be the losers. We do view Google as one of the winners, but this post will focus more on the winners and losers among those who implement CSEs.

As always, the ability to generate an audience is a key factor in success. So there will be an aspect of the rich getting richer here. Sites with large audiences will be better equipped to develop large audiences for their CSEs. But fret not small site owner, there is an opportunity for you too.

Users on the web are increasingly intolerant of selfish behavior. They greatly prefer to give things to people who give back (or who gave first). So people who implement CSEs that prmote their own site, and cut out all their competitors will be exposed over time.

Many site owners will be tempted to use this approach. But it is not the approach that provides users with the best search engine results. Users will figure that out, and stop using a CSE that is developed in this fashion.

The winners in the CSE game will ultimately be the ones who develop the highest quality results. That means including, and in fact, boosting the results of, some of their competitors. Yes, excluding “competitors” who run true spam sites will be OK – they are not helping users.

Developing a high quality CSE will require investment. While most businesses with a web presence study their competition at least somewhat, it’s a different thing to focus on tuning a search engine. Some very complicated markets will require a pretty substantial investment.

When disruptive events occur in a market, and we believe the advent of the CSEs is such an event, opportunities are created for those who see the long term landscape first. This happens over and over again in the industry.

After all, Google itself is an example of a company that emerged from nowhere by seeing the search landscape in a new way, and executing on it first. Well the search landscape has changed again.

Now I am not saying that implementing a CSE is going to make anyone a billion dollar company. But it is an opportunity to steal a march on your competition.

del.icio.us tags:

Google Custom Search Engines and Social Media

When Google Co-op was launched in May, I saw this as Google’s move into the Social Media space. The announcement on October 24th of it’s Custom Search Engine (CSE) initiative continues this trend. As with the other parts of Google Co-op, CSEs are intended to encourage social participation in improving the quality of search.

CSEs are designed to provide subject matter experts (SMEs) with the means and the incentive to hand carve improved search engines specific to their area of expertise. The means comes in the form of a simple, yet powerful, form based approach to picking sites to include, sites to exclude, sites to promote or demote, and more. You can read a complete Custom Search Engine overview here.

The motivation for SMEs comes in two forms:

  1. A share of ad revenue generated by the CSE
  2. The ability to generate an asset owned by the SME (or their employer, as the case may be)

So how does this relate to social media? There are a couple of ways that it does:

  1. Google is inviting all comers. Anyone can go ahead and create a CSE. This is a pretty bold move with potentially massive implications.
  2. CSE owners can decide to invite additional contributors to their CSE, either by invitation only, or by throwing their CSE open to the public. This means that CSE owner can make their CSE a social media project.

The fact that anyone can create a CSE has some interesting dynamics to it. You do need to worry about Spammy CSEs. Some will create CSEs that are designed to validate their poor quality sites using the Google name. It will be interesting to see what Google does to combat that.

One measurable approach Google could potentially use is to measure the use, and re-use of CSEs. Each usage is a vote of trust by the user, expecially if they re-use a particular CSE many times. While this is not something that they have announced any intention to do, Google could publish user generated ratings of CSEs.

Even without Google published ratings, we believe that they best CSEs will become known. In today’s web, people can share this information easily in forums, and more and more people know to look for this type of information. In addition, a company that I am involved in, Moving Traffic, Inc., has already launched a CSE Directory that provides editorial and user ratings.

This site will provide a method to find the best CSEs quickly and easily. I would also expect that you will be able to see plenty of commentary on the major forums, such as Search Engine Watch and Webmaster World.

By one means or another, users will vote. These votes will come in the form of usage and recommendations. Recommendations will drive the usage of others. These facts will put competitive pressure on the creators of CSEs, and this pressure will drive them to improve search quality.

Those of us in the biz love to talk about our search engines, yet in the past, we meant Google, Yahoo, MSN, Aks, etc. Now when we talk about our search engines, we will really mean “our”.

del.icio.us tags: , , , , ,

Strategic Value of Google Custom Search Engines For Site Owners

The new custom search engine (CSE) program announced by Google offers significant possibilities for web site owners. Implementing a CSE allows site owners to develop a new asset for their business, and provides a new revenue stream for their business as well.

Site owners can have a subject matter expert (SME) in their business design rules for their own CSE that provide a superior search experience in their area of expertise. A skilled SME should be able to steadily improve the quality of the search experience over time.

Since the site owner and their SME is knowledgeable in their vertical market area, and the area is by definition narrower that the entire web, the notion of human edited enhancements has a much better chance of working.

Equally important, the revenue sharing model means that the site owner will be highly motivated. The SME will be motivated because they have the opportunity to bring to their users a superior search experience.

Successful implementations should bring the users as well. Users will quickly recognize those CSEs that offer significantly higher quality. As a result, high quality CSEs will become known, and bring additional new users to the site.

You can also count on Google to promote the best CSEs. It’s in Google’s interest to do so. This provides an additional incentive to the site owner.

Many site owners will be tempted to force the results of their CSEs to emphasize their own sites, and exclude all of their competitors. However, this will not provide the best interests of their users, and will not serve the best interests of most businesses in the long term.

Users will gravitate to improved search results. Site owners with a long term vision will recognize this. They may end up sending some traffic indirectly to competitors in the short term, but the long term growth in the credibility of their business will more than make up for it in the long term.

del.icio.us tags:

Strategic Implications of Google’s Custom Search Engines

Google’s custom search engine (CSE) announcement has significant implications for the search engine industry and Google. This blog post will discuss some of these, but I am sure we will be learing about it for some time to come. Here are the major implications:

Combining Human Editing with Algorithmic Search

1. Google has recognized the inherent limitations in link based ranking systems. CSEs provide a method for incorporating expert human editor input into search results. There is a spark of brilliance in the way Google has done it.

Other human edited concepts, such as DMOZ, the Yahoo directory, and the various tagging sites that have emerged have all run into serious limitations.

DMOZ has run into problems because of its all volunteer model. Staffing is inconsistent, so many categories are uncovered, and it’s known that many categories are covered by people with competitive interests in the content of their part of the directory.

The Yahoo directory has the benefit of being staffed by paid editors. This ensures their motivations are managed by their employer. But in real terms, no human edited directory can truly categorize the entire web

Tagging sites, such as del.icio.us provide a voting mechanism which allows users to vote on sites. However, these tagging sites are no substitute for search. They are a very effective mechanism for uncovering the hottest new sites.

CSE’s provide a mechanism for subject matter experts (SMEs) to provide direction and guidance to their own personal version of Google search. Equally important, there is also a business model which provides economic motivation for the site owners / SMEs to do so.

Google is providing site owners with a revenue share of the ad revenue generated on their CSEs. In addition, the work done by the SMEs represents an asset which is owned by the site owner. This is a powerful combination.

The Distributed Search Model

Google’s CSE program are a major step in the direction of a distributed search model. As we said above, the notion of getting the human editorial input from economically motivated SMEs is a master stroke.

You can see that this may drive a whole new philosophy of search. End users may begin to migrate towards performing their search on CSEs implemented by SMEs that they trust. The long term winners will be those with the foresight to build CSEs with the purest editorial intent.

These types of CSEs will provide the highest possible quality results. Editors will remove spam sites, and tweak results to provide a high quality search result in their area of expertise.

As the model matures, CSE quality will improve significantly. More and more users will gravitate to the better versions of the search engines over time.

Improved Core Algorithms

So who will have access to all this wonderful data? Google. It is our understanding that they will not make use of the data in the short term, but in the longer term, one has to believe that they will begin to incorporate the wisdom of the market place into their core algorithms.

This could well be how they close the loop. Motivated, distributed human editors, driving the core algorithms. Sounds pretty sexy doesn’t it?

del.icio.us tags: