The following is the transcript of an interview about vertical search engines with Ask.com’s Gary Price.
Gary is a renowned expert in search, particularly structured data search, and a respected leader within the search, library and education communities.
In addition to his former position on the editorial team at Search Engine Watch, Price is the editor of ResourceShelf and Docuticker, online news sources for online researchers, librarians, journalists and educators. Price has been a frequent speaker at Search Engine Strategies, WebSearch University, Computers in Libraries, Internet Librarian and numerous other industry conferences.
Prior to joining Search Engine Watch and ResourceShelf, Price worked as a reference librarian at George Washington University in Washington, D.C. He co-authored the book The Invisible Web with Chris Sherman and has compiled several well-known web research tools, including Price’s List of Lists and Direct Search, a compilation of invisible Web databases. He has received numerous awards which include the Innovations in Technology award (2002) and the Anges Henebry award (2004) from the Special Libraries Association and Alumnus of the Year (2004) from Wayne State University.
Gary’s role at Ask.com is to help make search better for users and be a resource to the Ask.com community.
Eric Enge: Let’s talk a bit about Vertical Search.
Gary Price: This is something I have been interested in for a long time.. You may know that Chris Sherman and I (from [http://www.searchengineland.com] Search Engine Land) wrote a book nearly six years ago called The Invisible Web. In essence, many of these databases that we were writing about back then would are today considered verticals.
That book is basically just about specialty search tools, those that are now called Vertical Search Engines. It’s a booming area and I think it shows too that one search engine is never going to be the perfect search engine for everybody. The challenges for Vertical Search are, (1): getting people to know about it, and (2): getting them to try it. And after they try it, getting them to comeback and use it again. I think the biggest challenge though is just getting people to know you have a Vertical Search offering, and getting them to look at it.
In reality the Invisible Web (or deep web) of 2007 might be larger than what we wrote about in 2001. Why? Well, yes, of course, more material is crawlable by the major engines (of course there is still a lot that is not) however, for many searchers, if it’s not in the first 4 or 5 results on a web results page, forget it. In other words, more data is out there and there is much more importance on getting into those first few results.
This is also why:
- Some knowledge of search goes a long way. In other words, information retrieval skills and critical info skills.
- Tools that an engine itself can provide to help the searcher narrow, expand, focus and perhaps offering direct links to key databases (including verticals) and other sources.
- Why zeroing in with a vertical can also help.
I think it is much different now than when Chris and I wrote our book. Now you have to tell people that it is out there, and also tell people why they would want to use it and how it could be a benefit to them. This was not as much of an issue seven years ago.
Eric Enge: I agree. A user may come to your site and use your search box, and not even know that they used a Vertical Search Engine. In addition, even if they do know, they certainly are not going to sit there and compare it to some other search engine’s results. Right?
Gary Price: I would agree with that from the commercial end of things, but perhaps not as much on the research end of things, where you are doing it constantly and your tweaking your searches and that type of thing. Let me add that comparing with only one search (one time) is also a challenge verticals often face. Of course, this is not the best way to compare.
Eric Enge: Fair enough. Let’s talk a little bit about Vertical Search Engine platforms. By that I mean products such as: Eurekster, Google Custom Search Engines, Yahoo Search Builder, Microsoft’s Live Search Macros, and Rollyo.
Gary Price: These platforms are OK for two or three percent (aka search geeks) But I think that we need to ask if tools like that are going to be used by the masses? And also there is the challenge of getting the masses to know that those tools exist in the first place. I’ll also add that most of these tools are based on the large databases from the major search players. So, while they can help focus they still might be missing data.
Eric Enge: Exactly, and the average person certainly doesn’t want to spend three minutes reading an explanation as to why you should use this particular search box. They just want to get their answer.
Gary Price: Yes. It is one thing to have service whether it will be Google’s or Ask’s or Rollyo, and it is another thing to get people to try it, and then another thing to get to people to use it on a regular basis. It is not a field of dreams, building it doesn’t guarantee people will use it. , You are going to have to figure out some way of getting people to A: know about it and B: understand it and C: use it. D: use it again.
Eric Enge: From a data point of view, you might be interested to know that Eurekster vertical search engines currently do more than half a million searches a day. So it is still small, but at least it is a real number.
Gary Price: One of the big questions is what it takes to get someone to switch their search engine. If I were to ask my sister to switch search engines that she is going to have to see something completely different and exciting to her, and not just another set of search results.
Eric Enge: I agree. Another vertical search scenario are the companies that build an entirely brand new search, including their own custom databases, and perhaps even their own crawl of the web.
Gary Price: Right. That could be anything from a Kayak a travel search to something like what we we’re doing with AskCity, or what Microsoft is doing with LiveBookSearch. Heck, you might even think of Amazon’s Search Inside the Book a vertical. In the case of SITB not only do you see pages of book material but they also offer value added data.
Eric Enge: These are good example of search tools that draw upon specialty databases, or special organization and optimization of such data.
Gary Price: Yes. One of the things that Ask.com is doing is to integrate specialty databases into the regular web results page we offer. We have been doing this for years. And we were one of the first general search engines to actually look at the context of the search. So if the person is searching for a person’s picture (e.g. Paul Revere’s picture), does the web results page actually show a few images, and then link you right into the image results. This is a great way to get people what they want in a short amount of time, and getting them to know that there is an image search available.
Eric Enge: Right, one of the things that struck me from our last conversation was the Beatles search. Right in the search you offered some great disambiguation. For example, the results includes a drop down list box with each of the Beatles names in it which you could use to refine your search right there.
Gary Price: In that case, we are using bunch of different data sources, for example, allmusic.com and who2.com, which is the bio database. For some questions we can actually give you the actual information you want on the results page, not just a link to a page with the answer. So, if you type in “Academy Award Best Picture 1972″, you will see that we provide the answer directly on the results page.
So this is an example of Smart Answers, and, of course, we provide the user with links (as do others) to find out more, if they want more. Another example is what Ask.com offers for the search wedding registry (brides name) for example. Here we connect with a meta database with six national wedding registries. It saves the searcher time, possible aggravation, and effort.
The other thing that keeps hopping into my mind about Vertical Search, is that the content is the message here, and I think that sometimes that gets lost. So what are the Vertical search databases out there that provide either enhanced content, content you can’t find anywhere else?
So for example, if I am looking for an article from the New York Times from 1872, I can go to my local public library without physically having to leave the home, and get full text access to it by just typing in my library card number. And, I consider this an example of a Vertical Search tool as well.
Just about every public library in the United States has access to these types of services. So, if I am looking for an article for personal use, I don’t have to pay for it by going to newyorktimes.com. I actually have access to it for free for personal use from my local public library.
Eric Enge: It seems, as I look at your role at Ask and some of the examples that you have shown me, that Ask is trying to create access to all these types of Vertical Search assets through the Ask.com Smart Answers in your search engine.
Gary Price: Right, but we also offer Smart RSS feeds. These are not just blogs, but they are feeds. So if I search on “consumer products recalls” you can sign up to a related RSS feed and get the data on an ongoing and near real-time basis. I think it’s an excellent way to leverage RSS (syndicated content) One thing we want to do to expand Smart RSS is to turn it into a teaching tool. So yes, people will then maybe learn that there is a database that focuses on this kind of content. Or if you type in “images of San Francisco”, you can learn that there is such a thing as an image search database. Next time you can go to this directly if you want
Eric Enge: So, how do you see Vertical Search evolving over the next few years?
Gary Price: I see a huge growth in federated searching, what people traditionally referred to as meta searching. Six or seven years ago, when I was working at George Washington University, we licensed millions of dollars worth of specialty databases. If I was lucky enough to get in front of a class to tell them about the databases, it basically went in one ear and out the other.
Now a couple of things have happened in the last six or seven years. The native search interfaces have gotten much easier to use, and the improvements in federated or meta search technology have been very strong. So, if I am an MBA student, and I am doing a research project, there could be six or seven independent databases that I need to be using. Learning to use all their different interfaces can be overwhelming. The best thing is if you can take these independent sources and merge them all together using some type of federated search technology, so you have your own one-stop interface. Then you can even add personalization and database selection.
I think the idea of having your own personal information client in one form or another, that will search multiple databases, take advantage of the controlled vocabularies, de-dupe the results, and merge them all into one place, is something that we will be seeing more, and more of in the future.
Eric Enge: So, for example, you can decide that I really don’t want answers from this database, but I do want to include this one instead. How does all of this affect the role of the librarian?
Gary Price: Some people believe that the librarian’s role is becoming smaller and smaller. I would argue that there is a huge and expanding role for information professionals. In addition to traditional jobs like building collections, organizing, etc.
Btw, I still believe that info pros have a huge and growing role organizing information both using traditional skills and also helping developers build software products. .
For example, it’s the role of the information professional to build the underlying databases, and place them in the underlying categories that would make it easier for an MBA student, or for my sister to pick and choose what categories she is interested in, and then use all those databases with in a search.
Another area which is growing is database selection. So, I have a hundred databases, and maybe thirty of them might be of interest to a particular searcher. Well, which are those thirty? I am not going to search all thirty of them at the same time. I mean that’s ridiculous. So, which of those thirty should I focus on and create my own little category? This kind of concept has been around for years.
Eric Enge: You mentioned Dialog in one of our earlier conversations.
Gary Price: Yes, Dialog is one of the largest information supermarkets on the web. It was started in 1960’s at Lockheed Martin. They have a common interface across maybe seven hundred different databases. So, you could technically search all seven hundred at the same time. But, why would you do that? But they also have tools to help you pick and choose which are those databases you should string together, if that’s what you are going to do. They use a common syntax across all these different databases.
Eric Enge: Right. So back to something you said, earlier about information professionals, the point seems to be that somebody has to figure out which ones are of quality, and which ones are relevant to a given need. Otherwise people may treat non-authoritative resources, such as Wikipedia, as authoritative.
Gary Price: Right, I still think the jury is out on Wikipedia. We have been hearing from Wikipedia for years, that they are going to have a review board. For popular topics that would be fine, I don’t know about long tail stuff. For example, I can tell you that I have a Wikipedia entry, and it wasn’t until about three weeks ago that I noticed that it said that I was the editor from a magazine that I have written for in the past, but I never was the editor of. I also believe that the recent launch of Larry Sanger’s Citizendium is also worth watching. Sanger left Wikipedia where he was a co-founder with Jimmy Wales to start this project.
One huge issue in K12 education is developing critical information skills. For example, what a reference librarian looks at are things like scope, currency, updatability, accuracy, authority of the publisher, authority of the author, who is saying what and why, those types of things.
Eric Enge: Of course, establishing authority is a challenge for general search as well.
Gary Price: Correct. The notion of citation analysis has been around for a long time.
Eric Enge: Right, and it depends on the people doing the citation not being aware that they are being observed.
Gary Price: The guy who is credited as the “father” of citation analysis started it, namely Eugene Garfield, was at the University of Pennsylvania. He also began the Institute for Scientific Information, and they are the first ones to do link analysis, or were the first ones to do citation analysis. And now their web product is called Web of Knowledge. But, they are also a closed system, that is they have people who decide that these websites and journals are the ones that we are going to count. There is no way of somebody gaming the system to the degree that you can game an open web search engine.
Apostolos Gerasoulis, who we call AG, who is the founder of Teoma, has a lot more knowledge on the technical end of this than I do. I can also say that Teoma is really the next generation. The first search engine to really take into account link analysis was not Google, but was a product in the mid-nineties from IBM.
That product is called CLEVER. CLEVER was started by another very interesting person. His name is Jon Kleinberg. CLEVER was really the first major engine to take a look at link analysis and looked at hubs and authorities. After that Apostolos came along with his team at Rutgers and developed the Teoma technology which powers the Ask.com database right now.
Eric Enge: Right, so to come back to you notion of federated search, you are really seeing an improved integration of authoritative specialty databases.
Gary Price: Right, I think you can see a little bit of that starting with the AskX.com prototype that we built. And I think you are seeing a lot from our competitors as well. But it’s important to remember that the point is the actual information itself. The quality, the authority, the scope, the accuracy, of the information is critical. No reference company, no reference book or database is perfect, but you have to take into account where the information is coming from. And I don’t think that evaluating the authority of a source and the scope of a source is being taught like they should be in the schools.
Maybe, it’s because the teachers who are teaching K12 have grown up with this whole web search phenomenon, and many times that data is good enough. But, if this is the information age then judging information should be a critical skill.
Eric Enge: I have young kids, and they go to search engines and get answers, and it is a very effective technique, but are the answers right?
Gary Price: It is even more of an issue when you get into the whole gaming of the search engines that goes on. I used to work in a newspaper business and everybody wanted to have an advertisement on the upper right-hand page in the first twenty pages of the book, or the magazine, or the newspaper, and it’s the same way with organic search results. It’s critical when you consider that 87% of people never even click off of the first page. And, most people are still using poorly selected search terms. It becomes a huge issue
Btw, this is the way it works. Period. It’s not going to change. However, like I said before a little knowledge and some basic skills can go a long way. In fact, the “researcher” needs to realize that search engines are also marketing and advertising tools.
Eric Enge: So let’s get back to the role that librarians can play in all this.
Gary Price: I hope librarianship takes more of a role. There are a lot of tools out there that people don’t even know about. One example is the Librarian’s Internet Index. It’s completely non commercial. It’s one of the four major non commercial web databases that are put together either by subject matter experts, or by librarians, in this case by lii.org.
Another organization that does this is ipl.org.
There is another one that is phenomenal. Intute.ac.uk is an incredible project of resources put together in the United Kingdom. if you look at the site you will see pre-built tutorials on how to do internet research for everything from chemistry to beauticians.
Eric Enge: Interesting.
Gary Price: Going back to the challenge of some of these more commercially oriented Verticals, such as a travel database, you can still have a federated search solution like Kayak,. They are going out to the independent travel sites, such as Orbitz and Expedia, and then aggregating the results, removing the duplicates, and then letting you find the results that way. So many databases, so little time would be another way of putting it.
Eric Enge: So they are aggregating and pursuing data whose authority you can’t question, because Travelocity’s price is Travelocity’s price right? Which is really distinct from some of these other things that we talked about where establishing the authority of the data source is a really big piece of it.
Gary Price: Yes, that’s true, airline tickets are another story. Lately, I’ve been fascinated with the data mining technology that Farecast has developed .
Another example of meta search is simplyhired.com or Indeed.com, which does more than go to aggregators of jobs, but also goes to the individual company websites. They could probably create another whole business in an area called Competitive Intelligence, because job listings are a great way to do business research.
Eric Enge: Yes, you can see who they are hiring.
Gary Price: You could probably create other revenue streams using all that data as well. Then there are companies that make great specialty search tools, such as Clusty and Vivisimo. There is no end to the amount of companies working in the federated search base.
ClusterMed is interesting too, because they are using PubMed, which has controlled data fields, authors, subjects, and medical subjects, which you can cluster various ways. You can cluster on the medical subjects heading, you can cluster on the author’s name, and you can cluster on the author’s affiliation, that kind of thing.
Eric Enge: This becomes critical as the amount of information about their continues to grow.
Gary Price: Right. I think the amount of “information” that’s out there is just going to continue to explode. For example, we will have lots of audio data, such as podcasts, academic lectures, or CNBC. I would put that into another very useful set of verticals and that’s multimedia search tools. Not just sites like YouTube, but things like TVIS, and Critical Mention, and FedNet, which was just purchased by congregational quarterly, which provides a near real time search of every word spoken on all the major TV stations in United States. So if they mention “Southborough Mass” in passing on CNBC at 6:33 in the morning, by 6:37 you will have an alert in your email box, and you can click on it and watch it. And then there is a service, Nexedia, where they are breaking the spoken word into phonetics sounds and being able to index it that way.