Eric Enge interviews Udi Manber about Search Quality
Published: July 9, 2007
As a Vice President of Engineering, Udi is responsible for core search. Before joining Google early in 2006, Udi was CEO of A9.com, a Senior VP at Amazon.com, and Yahoo's Chief Scientist. He started working on search algorithms in 1989 with the invention of Suffix Arrays (with Gene Myers) while he was a professor at the University of Arizona, and he was a co-developer of several search packages, including Agrep, Glimpse, WebGlimpse, and Harvest. He started developing search and other software tools for the web 2 months after Mosaic was announced in 1993, and continued ever since. While in academia, he also worked in the areas of theoretical computer science, computer security, distributed systems, and networks. He won a Presidential Young Investigator Award in 1985.
Udi holds a Ph.D. in Computer Science from the University of Washington.
Eric Enge: One thing I noticed in the recent New York Times article by Saul Hansell, is that you've been at the University of Arizona, you were Yahoo's chief scientist for a while, and you were at Amazon. They remarked in the article, that you were surprised about how far Google was in terms of technology when you arrived to Google. Can you talk about that a little bit.
Udi Manber: I can tell you, if you are asking, that, compared with the University of Arizona, the food here is much better, but the parking problems are about the same.
Eric Enge: My one little side story on that is when I came out for the Searchology event (the announcement of Universal Search), the Tuesday night before I was on Google campus playing in a foosball tournament. Note that 1984, I was actually a world champion in foosball.
Udi Manber: Oh really?
Eric Enge: And in 1985, was a national champion, and so myself and my business partner, who is also an extremely good foosball player were invited. But of course, you know the treat was 20-feet from the foosball table there was all this wonderful food.
Udi Manber: Kind of distracting.
Eric Enge: Yes, know it was compelling, so we agree, Google is ahead in terms of food technology.
Udi Manber: And, just about on par in terms of parking. Anyhow, I'm not going to talk about Yahoo or Amazon, but again I can tell you that I was incredibly impressed with the quality of the team, the engineering team. It kind of makes you humble to be in the presence of such great engineers.
Eric Enge: There is clearly an outstanding data management, data processing type expertise, server management expertise, and these seem to me that these are places where Google is ahead of the industry in general.
Udi Manber: I just, I can tell you that it's an incredible pleasure to work with my team. It makes everyday wonderful to come to work.
Eric Enge: Can you talk a little bit about how ideas percolate up in Google and become programs that you pursue? For example, I understand that there is this environment where people are encouraged to bring forth new ideas and work on things that are of interest to them
Udi Manber: Absolutely, so for the ranking team, many of the ideas come from the engineers, and they are free to pursue them. We have an incredible infrastructure that makes it easier to run experiments and then measure the results. The measurement is a very important part of it.
Then, we meet once or twice or three times a week to go over the results to follow the experiments, and then make a decision to launch them or not. We see many, many such experiments. We run literally thousands of experiments a year and pick the ones that score well.
Eric Enge: Right, and so that scoring is done based on how it performs on a live but backup copy of the web?
Udi Manber: I'm not going into details about the scoring, it actually involves several steps, and so we have a wonderful team of people whose sole purpose is to evaluate search results, that's what they do. They use hundreds and thousands of different data points and they measure things all the time, and they also evaluate those experiments and those new ideas. That's all done based on what we believe is better for users. That's the goal, to change the algorithm so overall users will see better results.
Eric Enge: One of the things that I'm quite familiar with is the Enquiro Eye Tracking Study. One thing that really emerges from that study is that the perceived quality of Google's results is significantly higher, and you can see that because the golden triangle in the Enquiro Study is much smaller and tighter for Google than what you see with the other search engines they looked at. I wanted to understand if this was kind of thing that Google actively measures.
Udi Manber: Absolutely, we measure this and we measure hundreds of other things. The trick is to be able to tell the overall user experience, and meet the overall goal of making things better. There is a tendency for people who are not that familiar with search to look at a few anecdotes and evaluate search in terms of very few anecdotes. As you probably know, it's much more complicated than that.
You have to look at thousands of different queries over a long period of time, understand the patterns, and that's what this evaluation team is doing all the time. For example, we do user studies all the time, and we even tap into visitors to the cafeteria. So if you come to the cafeteria, you will see signs all over the place, saying we will pay you a small fee to participate in the user study, before or after lunch. We figure that's a good way to get user studies.
Eric Enge: So, you are not thinking, that the fact that they are hungry at the time might skew your results?
Udi Manber: That's an interesting conjecture. We'll see if we can run the numbers and find where they are before or after lunch, and if it changes your perception, I doubt it but who knows?
Eric Enge: Yes indeed. I threw it out as a joke, of course, but if you actually did find something, you might interpret search results differently at 11 a.m. then at 1 p.m.
Udi Manber: I doubt it makes any difference.
Eric Enge: Oh, yeah I doubt it too. So, the original algorithm was based on this whole notion of PageRank. Now something that people talk about in the industry now is the notion of "trust rank". The idea here is that sites have to earn certain levels of trust over time as part of a ranking, either because of domain aging, or authoritative site links and things that might suggest that they should be trusted. But also, a lot of decisions are made at the time of the search query. So in my own head, I think of that as query specific PageRank in evaluating the PageRank against the actual query. In other words, the weight of the relevant links and scoring that in some fashion. Can we talk about these notions a bit?
Udi Manber: I cannot be specific, but, we use more than a hundred different parameters. PageRank is still an important parameter, but it's just one parameter. And, there are all kinds of parameters, such as whether the word appears in the title and whether the two words are close together and all the obvious traditional information retrieval parameters. There are many others that we invented and there is the combination of all of them, which is really where the hard work is being done, figuring out when and how to put them all together, of course, all of which is being done in real time.
Eric Enge: At the Searchology event I believe you said that 20% to 25% of the queries received everyday are queries that Google is receiving for the first time. That's a fascinating statistic all by itself, but I think it's also indicative of how vertical this is all getting. The New York Times article covered a couple of these examples, such as "teak patio Palo Alto," or the discussion of query deserves freshness to identify things that are more freshness oriented (e.g. breaking news), and I think you alluded to this earlier is the burgeoning complexity of it all. It's not something that you can take on in a single gulp. This is a very deep process, As you get deeper into this is the rate at which you are dealing with verticalization of the algorithm accelerating?
Udi Manber: With regard to the New York Times article, some people misinterpreted those comments to mean that we are doing manual fixes for queries, and that's not the case. What happens is we see examples and those highlight for us certain weaknesses or certain places we can improve and then we go and change the general algorithm. We don't change things with specific queries. We don't improve one query at a time. That does not scale.
In terms of complexity, we are very sensitive to this, and we have projects that their sole purpose is to reduce complexity. A team may go and work for two months on a new simpler sub-algorithm. If it performs the same as the previous algorithm, but it's simpler, that will be a big win and people are encouraged to do that, and some of the improvements that we are making over time are those kinds of improvements. Overall, we have to be very careful that the complexity of the algorithm does not exceed what we can maintain, that's definitely something we are thinking about.
Eric Enge: That's a general engineering problem of course. If you add 30% more lines of code, you are probably doubling your engineering staff.
Udi Manber: It's not just the lines of code, it's really the intuition and understanding of what it's doing. We want our engineers to have a good intuition that when they try to make some change they understand the impact intuitively. You can't understand everything, you have to measure lots of things, but you want to have reasonable intuition to understand what the ramification of every change will be.
Eric Enge: Right, when they are forced to make a decision right at that point in time, you want to improve the chances that their decision is accurate.
Udi Manber: That's right. You also want to direct their efforts towards things that are more likely to succeed. We can measure after you do something whether it works or not, but of course you have to make a decision, you have a hundred different options to proceed, you can't measure all of them so you have to make a lot of intelligent decisions along the way. Making the algorithm simpler makes it easier to do that.
Eric Enge: Yeah, that makes sense. Now the teak patio Palo Alto example in the New York Times, taught you to put more weight on local area links related to Palo Alto so that from an algorithmic level that that would cause better matching for that kind of query. Are there other examples you can provide or anecdotes of situations that led to algorithm tweaks or improvements?
Udi Manber: There are many such examples, but I'm, I am hesitant to go into specific ones, because, that's our big advantage.
Eric Enge: Alright fair enough. So how far do you think you can go in terms of personalizing results?
Udi Manber: I would like to go as far as we can, but we have to do it carefully and the key is again, to measure correctly. Whenever we introduce a personalized feature, again we have to make sure that it's actually better for users. So we run lots of studies, and we measure it in many ways, and we launch it only when we see that it's actually better. I think, personalization is very important, but I think over-personalization is probably not good. The key is to find the right balance. If we discover that you're interested in sports, and then you search for medical information, we shouldn't give you only sports medicine results. If we do it all the time, we can harm more than we can help. The trick is to do it at the right time and do it in the right way.
Eric Enge: Right, so in your example, you might have one or two results related to sports medicine on the off chance in case that's what they are looking for …
Udi Manber: We might, and we might actually even say well these two interests are so far apart that it's probably not useful. It depends obviously on the case, but that's the balancing point.
Eric Enge: Right, I think Matt Cutts wrote about how personalization would help make it increasingly difficult for web spammers to succeed at their tasks. Is that your perception as well?
Udi Manber: I'm looking at it in terms of overall quality. That's our main goal, we are doing it so that the overall quality of results is better and it's easier for people to find what they want. But, the effect on spam maybe a nice side effect of that, but that's not what why we are doing it, we are doing it to improve overall quality.
Eric Enge: Even as you do personalization, it seems that it remains important to percolate up the non-personalized results. If you know someone is a football fan living in Florida, and they type in Dolphins, it is a good bet that they want information on the Miami Dolphins, but sometimes, those people want information on the sea creatures too.
Udi Manber: Exactly. If we did show only football team results, there is no backup for them, there is nothing else they can do. We would have bombarded them with one area or one guess.
Eric Enge: Right, so a big trick then, is striking that balance, and that's one reason for all the testing that you do.
Udi Manber: That's right.
Eric Enge: A similar thing with Universal Search is striking the balance of how you decide, where that video, or that map, or those local results fit into the overall search results and scoring those diverse kinds of inputs against each other.
Udi Manber: That's right.
Eric Enge: Did the Universal Search Project start a long time ago?
Udi Manber: It started well before I came here, I am not sure how many years, but it accelerated over the last year. We now have a dedicated team, a wonderful team to work on it and you can see the results of the first stage.
Eric Enge: There are more products coming out from that team in the future?
Udi Manber: Absolutely, this is just the first stage, we want to blend more results, improve the ranking, and in general get people what they want.
Eric Enge: Right. I have speculated on whether or not this would in fact increase the traffic of the image search engine, the local search engine and things like that, simply by making them more familiar to a wider array of users.
Udi Manber: I don't know the answer, we will see, but the primary goal is not to direct more traffic to particular places. The primary goal is to get people what they want, so that's how we look at it. We try to optimize the quality of the results. If it turns out that this will make people discover some things that they didn't discover before, and I think videos is the most obvious example, that's great.
If you search for "I have a dream", if you did that five years ago, you were probably looking for the text of the speech. You wouldn't think necessarily that you could actually see that speech. So, you wouldn't search for the video of the speech because you wouldn't think that you can do that. And, if we show it to you, it's not necessarily what you said you are looking for it, but there is a good chance that you want to do that. That's a nice discovery, and will cause people to search more videos, because now they realize that they can find more videos.
Eric Enge: Personally speaking, putting all the results together in a universal search format creates a different levels like accessibility. We are all so attention time challenges in this society that if I had thought before that I wanted to show my kids the I have a dream speech that I would probably have hesitated to do that because I had to go to some other interface I am unfamiliar with, and I wasn't sure I could get the speech. But now, it's immediately accessible.
Udi Manber: That's the magic of the web, all this information is available to everybody, all the time, and with Google, you can find it.
Eric Enge: What about the notion of the role of humans in search? There is another post by Matt Cutts where he talks a little bit about ways that Google is using human input. People can say they don't want to see a particular site in their results, or they can vote on it if they have a Google Toolbar installed, and that sort of thing. Do you see Google looking for more ways to, scalable ways obviously to take advantage of human input?
Udi Manber: Absolutely. As Matt has said, we have done it from the beginning. If a website points to another website, that's a signal, that's a signal from a user and we use that signal. If somebody says they don't like a particular search result that's a signal. So, we've been using that for a long time, and we are working on new ways of using it.
Eric Enge: Do you picture at anytime having any kind of editorial oversight in the form of Google employees looking over results an editorial perspective?
Udi Manber: We have no plans of doing that in the foreseeable future.
Eric Enge: Right, and you did allude a little bit to this, but social media input, like the kinds of things you find on social media sites like del.icio.us, or sites like that, is there a role for looking at those kinds of inputs?
Udi Manber: Sure. I obviously can't talk about specific or ongoing projects, but that's just one more signal, and we are looking for ways to integrate such signals.
Eric Enge: It seems to me that when you think about social media that there are limitations there too. You can have mob mentality that causes thing to get in an irrationally strong signal, like when Steven Colbert was pushing on the White Elephant page at Wikipedia. Suddenly that page had a tremendous amount of interest for an artificial reasons. There would need to be some real dampening on that kind of approach I would think.
Udi Manber: Sure, the key is to find the right balance, and that's true for almost everything we do. The key is to do the right measurements and to have the right goals and find out the right balance.
Eric Enge: Through this process, do you see the role that links play in determining a site's importance and relevance diminishing overtime?
Udi Manber: It's hard to say, I mean it might happen. When I think about it, it's very hard to predict. I think they will play a major role for the foreseeable future, I don't think there are going to be significant differences, but there might be small differences.
Eric Enge: Right, just by basically algorithm turning and adding new factors there has to be some impact of course.
Udi Manber: That's right and we are doing it all the time.
Eric Enge: That's right.
Eric Enge: Thank you very much, Udi.
Udi Manber: Thank you, it was my pleasure, wonderful questions.
Have comments or want to discuss? You can comment on the Udi Manber interview here.
Other Google Interviews
- Google's Vanessa Fox on Google Webmaster Tools and Dupe Content
- Mark Lucovsky on the Google Feed API
- Google's Adam Lasnik on Webspam topics
- Rajat Mukherjee on Custom Search Engines
- Brett Crosby on Google Analytics
- Mark Lucovsky on the Google AJAX API
- Shashi Seth on Custom Search Engines
About the Author
Eric Enge is the Founder and President of Stone Temple Consulting (STC). STC offers Internet marketing optimization services, including SEO, Social Media and PPC optimization, and its web site can be found at: http://www.stonetemple.com.