By Rob Pirozzi and Eric Enge
This article focuses on the annotations aspect of Google Co-Op Topics and Annotations. It provides a brief overview of Google Co-Op and then deals with annotating or labeling web content (URLs) in more depth.
In May of 2006, Google announced the Google Co-op program. This article is a follow-on article to a previous article, “Google Co-Op Overview“, which provided a high-level overview of Google Co-op. This article will cover the Topics component of Google Co-Op in more depth than the previous article.
Google Co-Op is interesting to users for two main reasons:
- Google Co-Op allows users to contribute information that will help Google to improve search results for everyone.
- Google Co-Op allows an end-user to customize their own search experience. Users do this by “subscribing” to sites that they consider trusted, and then Google prioritizes the placement of these sites in relevant search results for that user.
The Google Co-Op program is currently in beta-test by Google. All that is required to participate in the beta-test program is a Google account. Google has already publicly indicated that Google intends to use the Google Co-Op program to improve its search results, through the leveraging of social web and social search concepts. Google Co-op consists of two things:
- Topics, which are simply a means of labeling web content
- Subscribed links, which are a means for users to subscribe to a particular web site’s content
Topics provide users and webmasters with the ability to:
- Create an entire categorization or labeling scheme
- Provide labels for web content, which Google calls annotations
The remainder of this article will focus on the annotations aspect of Google Topics.
Annotations to URLs
Annotating URLs is perhaps the easiest part of Google Co-Op to understand. It also requires the least amount of technical expertise to implement. Topics are a labeling or categorization scheme that enable users to “tag” web pages with labels that have meaning for them. Through topics, users who are subject matter experts can share their knowledge and expertise by labelling web content related to their area of expertise.
Labels may be provided for an entire web site, portions of a web site, or even a specific web page. These “labels” provide some indication of the topic or topics for a given web site or page. In essence, they provide additional information to Google on what the web site is all about. For a searcher who sees these labels in Google’s results, it provides a supplemental method for finding what they are searching for.
Google refers to the process of providing labels for web sites as “Annotating URLs”. An annotation is simply the association of a label, or multiple labels, with a URL. For example, a travel site might get the label “destination_guide”.
Users may use labels for topics that Google already has under development, which include: health, destination guides, autos, computer & video games, photo & video equipment, and stereo & home theater. Users may also develop their own labels for topics. For example, if a user has an interest in “wine” they may develop labels for the topic wine, which may include “wine_regions”, “wine_types”, etc. They can then use these labels to annotate sites that deal with wine.
An end user may submit their annotations to Google in one of two formats:
- A tab-delimited formatted file which can be created using Microsoft Excel or any spreadsheet.
- An XML file.
Perhaps the easiest format for most users to deal with is the tab-delimited format. Users can use any spreadsheet to create an annotation file that they save in tab-delimited format. The spreadsheet file should be in the following format:
- Row 1 is used for column headings, where the first column heading is “URL” (without the quotes), followed by one or more “Label” headings, followed by an optional “Score” heading, an optional “Comments” heading, and, potentially, user-defined attributes headings, in the format “A=”.
- Rows 2 through n are used for annotation data.
Tab Delimited Annotation File Format
Please note: All headings are case sensitive.
A few examples will go a long way to illustrate annotating URLs. If I were using a tab-delimited file to annotate a travel related web site it might look something like this:
Annotation Example – Tab Delimited File
|http://www.travelsite.com/*||sightseeing||museums||shopping||1||Detailed destination information|
|http://www.travelsite.com/boston/*||suggested_itineraries||tours_day_trips||outdoor_activities||1||Detailed destination information about Boston|
If I were to add user-defined attributes to the same annotation file, it would look like this:
Annotation Example – Tab Delimited File with User Defined Attributes
|http://www.travelsite.com/*||sightseeing||museums||shopping||1||Detailed destination information||20060627|
|http://www.travelsite.com/boston/*||suggested _itineraries||tours_day _trips||outdoor _activities||1||Detailed destination information about Boston||20060627|
Please note: User defined attributes may only be used in tab-delimited annotation files. They may not be used in XML formatted annotation files.
If I were using an XML file to annotate the same travel related web site it might look something like this:
Annotation Example – XML Annotation File
Conventions for Labels
There are some simple conventions that should be followed when labeling content. First it is important to understand that labels may be applied to URLs or wildcard URLs. Using wild cards makes it much easier to label a lot of content with a few statements. For example:
- Labels applied to www.mywebsite.com/ would only apply to that specific page of the web site
- Labels applied to www.mywebsite.com/* would apply to all URLs that start with the URL “www.mywebsite.com”
- Labels applied www.mywebsite.com/*tips would apply to all URLs would apply to all URLs that start with the URL “www.mywebsite.com” and contain the word “tips”
Google provides excellent examples of using wild card URLs in their Topics Developer Guide.
A single URL may have multiple labels. If using a tab-delimited file, each label must appear in its own column.
Labels should be all lower case with all punctuation and conjunctions (and, or) removed. For example, “hardware and software” would become “hardware_software”. Label headings are case sensitive (URL, Label, Score, Comment).
Labels should be as short as possible and as unambiguous as possible. Watch out for words that can mean multiple things.
There are many good places to find additional information. The first is the Google Co-Op Site where they have posted a Topics Developers Guide. The Google Co-Op FAQ is also helpful. There is also a good article entitled “How to Use Google Co-op” at Google Blogoscoped.
Quick Links to Google Co-Op Information
Following is a collection of links to information referenced in this article for easy access:
- Google Co-Op Web Site
- Google Co-Op Overview
- Topics Developer Guide
- Topics FAQ
- Google topics templates for health and destination guides
- List of Active Google Topics
- How to Use Google Co-op
Why is Labeling Content Important?
The process of labeling content will benefit everyone in several ways. Labels will provide Google with a vast amount of information about web sites, potentially down to a very granular, or individual page level. If an individual’s annotations are found to improve the quality of the search results, they will be shown to everyone. In essence, over time, Google will use annotations and other aspects of Google Co-Op to improve search results.
Where do Labels Appear?
Labels are starting to appear in Google search results underneath the individual search results. For example, a Google search of Boston brings up a “Refine results for boston” box followed by the individual search results. Selecting “Attractions” in the “Refine results for boston” box yields a new set of results.You will notice that the first few results have the label “Labeled Sightseeing” appearing under them (see example below). This is an example of labels being used in search results and how they are displayed to the end user.
Labeled Google Search Content
Annotating URLs is a relatively low effort task for individuals that can reap benefits for everyone – better and more relevant search results. While still in its infancy, and going through the growing pains that are normal for services that are in beta test, Google Co-op clearly has a lot of promise to enable Google to provide much more powerful and relevant search results to users.