Wikipedia Launching a New Search Engine – Wikiasari

Quick Pre-Christmas post – Frank Watson at Search Engine Watch reports that Wikipedia and Amazon are teaming up to launch a search engine early next year. The story originally broke in the London Times where Wikipedia founder Jimmy Wales was quoted as saying “Google is very good at many types of search, but in many instances it produces nothing but spam and useless crap.”

Later information now indicates that Amazon does not have direct involvement, and that this is a Wikipedia owned project.  Evidently, social human input will play a major factor in determining search results.  In addition, indications are that the focus of the search engine will be on producing quality results, even if it means sacrificing depth.

It will be interesting to see how this search engine does.  People will want depth of results.  I am also not sure that community based human input will do a good enough job in producing good results.  Certainly it will be hard to police this as the index does grow larger.

My belief is that it is possible to do a better job at search, and someday someone will do it. Success will depend on a well built mixture of algorithmic processing power and human review. You can only go so far with pure algorithms, and human review is not at all scalable.

But perhaps it’s possible to build an infrastructure that flags things for human review – either through algorithmic detection, or a sufficient number of complaints from the community. There is no question such an approach would be expensive in comparison the (near) purely algorithmic search.

This might make it difficult for the current search engines to pursue. They are all public companies with operating profit goals that really can’t be changed without blowing a hole into their current valuations. But it would be fun to see someone try it.

Using a Site Map to Increase Traffic

Building a good site map can be challenging (for clarity’s sake, I need to point out that we are not talking about the Site Maps protocol here, but an on site, site map). But I find that there is a lot of confusion out there about what makes a good one. People are locked down on the notion that they need to have one page on their site that has links to every other page on their site. There is also confusion on when one is needed – not all sites need one.

Let’s start with the objective: Minimize the number of clicks from your home page to your content. That’s it. Many sites have simple enough architectures, and really clean navigation systems so that adding a site map does not reduce the number of clicks to your content. If that’s the case with your site, then you don’t need to waste your time building a site map.

However, other sites are more complex in their very nature. You may have a site that has content that takes 4 or more clicks to reach through your standard navigation. If this is the case, you may be a candidate for a site map file.

With all the “design for users, not search engines” discussion going on, I do have to note that a site map file is an example of where you actually do want to design something for the search engine. You can think of it as spider food. Done properly, it has never proved to be a problem in our experience.

There are pitfalls – I have seen tons of site maps with many hundreds, or even thousands of links on them. Not going to fly! You still want to limit the number of links on the page to 200 or less (other people say its 100 – your mileage may vary)

The way to deal with this is to divide your site map into multiple files, on a topical basis. So if you have a site selling thousands of different kinds of “widgets”, you might end up with multiple site map files:

  1. Widgets by color
  2. Widgets by size
  3. Widgets by location
  4. Widgets by manufacturer

This gives you a site maps files that are topically relevant, and potentially valuable search tools for use by your user.

These types of site maps files work. It was a major element of our Videomaker Site Redesign that resulted in 60+% traffic gain in the first 3 months, and traffic appears to have more than doubled since then. Cool stuff, and it’s cheap and easy to do for most sites.

Weighing in on Link Buying (again!)

The debate continues to rage on about link buying. Rand posted yesterday that he disagrees with Danny and Google, and Jim Boykin adds his comments early this morning. Clearly there is a lot of confusion over this. So here we go again … in this post, I will try to put forth what I believe the Google position is. Note that I am not a Google representative, so this is simply my interpretation of what I have heard.

Google wants to value a link for it’s editorial value.

This is the essential point in all of their logic. Editorial input by third party web sites is at the heart of all ranking algorithms (I believe that this applies to the other search engines as well). There are certain types of links that do not pass editorial value:

  1. Purchased links
  2. Reciprocal links that are traded without regard to relevance
  3. Links from domains known to be untrustworthy (or perhaps even links from domains that are not yet trusted)

Danny, in his comments in Rand’s blog, argues, with some merit, that Google is trying to put a genie back in the bottle. People are out there buying links, and it’s a fact of life. It’s also true that there are links that you can buy that are “under the radar” that Google will not detect.

But let’s return to the essential value proposition of Google’s search engine. Their goal is to offer the highest quality search results on the planet, bar none. Links offered on an editorial basis are used as a voting system to find the sites most likely to answer a question relevant to a user’s search query. Note that an essential part of this process is the relevance of the links to the query, and the relevance of the content on the page to the user’s query. That said, links carrying editorial value are at the heart of their algorithm.

While the genie may be out of the bottle, don’t expect Google, or any other search engine for that matter, to relent on this point. It’s about the core value proposition of their search engines. Having the most relevant results can equate to increases in market share. A drop in relevance can lower their market share. Purchased links are seen as a threat to their business.

A follow-on note about buying links from trusted directories, such as Yahoo, Best of the Web, and Business.com, are, in our opinion (yes, it’s just an opinion) treated a bit differently.

These directories all have published policies where they can reject any submission, even though you have already paid them, and your money is non-refundable. In addition, they can take your listing and put it wherever they want, even if it differs from your suggested location. Thirdly, they are seen as businesses whose basic value proposition is centered around the quality of their editorial process.

This results, again, in our opinion, on these cases being a scenario acceptable to the search engines for purchasing links. This is notably distinct from lower quality directories that have not earned the trust of the search engines for the quality of their editorial team (note: my list above is not meant to be a complete list of “trusted” directories).

I don’t think that the search engines will offer up any other scenarios that they consider acceptable. They are running multi-billion dollar businesses that depend on their algorithms, and purchased links just don’t fit into that.

Adam Lasnik Clarifies Google Stance on Duplicate Content

Google’s Adam Lasnik has offered up a post today on Dealing Deftly with Duplicate Content. In it, he offers up the official Google stance on the issue.

In this post, he addresses the following key issues:

  1. What is duplicate content?
  2. What isn’t duplicate content?
  3. Why does Google care about duplicate content?
  4. What does Google do about it?
  5. How can Webmasters proactively address duplicate content issues?

However, there are issues that are not addressed in Adam’s post (fyi – these are issues which it’s not really Adam’s job to point out …). Here are a couple of key ones:

  1. When you pages on your site that are duplicates (or near duplicates) of one another, the crawlers spend time crawling them, instead of other pages on your site. This can result in fewer indexed pages on your site.
  2. Since Google chooses which one of your pages to list, this means that some pages are not listed. However, the page rank that was passed to those pages, still gets passed to them. This means it’s wasted on those pages that do not get indexed, instead of being allocated to pages that are indexed.

There are many other issues with duplicate content, but these are among the biggest. It can be hard to resolve duplicate content problems, particularly if they result from poor site architecture, or the implementation of your content management system. But if your web traffic is a key part of your business, it’s well worth the effort.

Improve your Google web search box in 10 minutes or less

This post will give you an idea of how to use Google Custom Search Engines to improve the web search tool you offer on your site. It’s really cool, and it’s really simple. Follow the five steps list below, and you will have the same web search you already have on your site, except your site’s rankings in that search engine will be increased.

Using this trick, you can still use Google’s web search functionality to pump cash into your AdSense account, but you site shows up higher in the organic results. Everyone who uses Google web search boxes on their site to drive revenue should do this. Here are the steps:

1. If you don’t have one, create a Google Co-Op account. Login to the account.

2. Click on the link to create a Custom Search Engine page. Then click on the new search engine link.

3. Use the following image to fill out the first part of the form on this page:

Custom Search Engine Basic Info

Once you are done with this, fill out the next part of the form as follows:

Custom Search Engine Detailed Info

Note that you must have your domain in the “Sites to Search” box, and you must check the “Search the entire web but emphasize these sites.” radio button. This is the set of actions that ensures that your site will be moved up in the search results. Make sure that you check the box indicating that you agree to the terms of service, and then click the “Next” button.

4. Next, go test drive your Custom Search Engine. Check it out and make sure it’s working OK. If all is well, you will be getting the same search results you were getting before, except, your site will rank higher than normal for relevant terms. Click the “Finish” Button when you are done. You will be sent to a page that lists your Custom Search Engines. If this is the first one you have created, you will only see one listed on this page.

5. Click on “Control Panel” and then “Make Money” to get to a page where you enter in your AdSense account info. Enter in the info to make sure you get paid on any ads that get clicked on in your search engine.

6. Last, but not least, click on the “Code” menu item to get the Javascript code for your Custom Search Engine. Take the code and put it in the template for the pages of your site. Make sure to replace the Javascript code for the Google web search box you currently have on the site.

The sit back and relax, You have now increased your site’s brand presence in the search engine on your site, and it only took a few minutes.

Note that Google Custom Search Engines do not allow you to have a radio button allowing users to choose web search or site search. To do that with the Google Custom Search Engine you would have to implement a second search box. Hopefully, radio button functionality is something that Google will offer in the future.

Google’s Stance on Reciprocal Links

Search Engine Watch put up a post about Google saying No to Reciprocal Linking. In this post, Chris Boggs does a good review of the statement by Google in their Webmaster blog, and discusses the nature of the ambiguity in Google’s policy, and suggests that Google clarify their stance.

My bet is that there is no clarification coming. The inherent problem in providing too much detail in what Google does, is that it makes the job of spammers easier. By making it more difficult to determine what Google’s algorithms are, Google is effectively making the spammers job harder. It also helps them sustain competitive differentiation.

However, I personally don’t think the situation is that confusing. Our policy is that we only trade links with sites if we would be willing to give a link to that site, even if they did not link back to us. And, we never publish links pages. Out bound links always show up on topically relevant pages that are part of the core content of the site.