Sep Kamvar is the engineering lead for personalization at Google, and a consulting professor of computational mathematics at Stanford University. Prior to joining Google, Sep was the Founder of Kaltix, a personalized search engine company that was acquired by Google in 2003.
Eric Enge: You and I spoke previously about Google Gadgets, and you are also on Personalization at Google. What other things that you are involved in?
Sep Kamvar: I am the Engineering Lead of Personalization, which includes personalized search, iGoogle which is our personalized homepage; recommendations, Google Alerts, and any new things that we do in Personalization. So, my work with Gadgets is inline with my work with iGoogle.
Eric Enge: Great. Why don’t we start with a definition of what personalization is?
Sep Kamvar: For me a personalized product is one that can use information I’ve given it about myself, in order to provide a more tailored, more individualized experience. Our two main focuses in personalization are personalized search and iGoogle, our personalized homepage. iGoogle is pretty clear-users give information about themselves by putting gadgets on their homepage that are of interest to them. In personalized search, you give information about yourself by opting into the search history or the web history product, which allows us to then tailor the future web results based on your past history.
Eric Enge: If you have somebody who is getting different search results, because they happen to be located in Seattle, and they type in the word Pizza versus what someone gets if they happen to be located in Portland Maine. Do you consider that personalization or is that something else?
Sep Kamvar: Yes, we consider that personalization as well. Location is one of our stronger signals in personalized search.
Eric Enge: Right. Some things can be done algorithmically. You are not relying on the user having told you something or looking at past search history or something like that.
Sep Kamvar: That’s correct.
Eric Enge: Right. What are some other kinds of signals that can be used for personalization? You don’t need to give me the algorithm; we can talk theoretically here.
Sep Kamvar: The two signals that we use right now are the search history and the location. We constantly experiment with other signals, but the two signals that have worked best for us are location and search history. We imagine that over time we’ll continue to add signals, but those will only be after a long period of experimentation, and once we are convinced that it’s a very useful signal.
Eric Enge: So, presumably what you do then is when you want to experiment with a new signal, you put is through a trial on a limited basis and you measure and test the results to see if you think it gives you a better result or better experience?
Sep Kamvar: That’s correct, yes.
Eric Enge: Do you do that by looking at click through rates and user behavior patterns?
Sep Kamvar: Unfortunately we can’t say how we actually measure the quality of our search results, but we have several metrics. We will do an experiment, and then see how it performs on those metrics, and from that we’ll determine whether it’s a good signal or a bad signal.
It’s actually really interesting; some signals that you expect would be good signals, turn out not to be that good. So for example, we did one experiment with Orkut, and we tried to personalize search results based on the community that users had joined. It turns out that while people were interested in the Orkut communities, they didn’t necessarily search in line with those Orkut communities. It actually harkened back to another experiment that we did, where in our first data launch of personalized search we allowed everybody to just check off categories that were of interest to them. People did that, and people would check off categories like literature. Well, they were interested in literature, but they actually didn’t do any searching in literature. So, what we thought would be a very clean signal, actually turned out to be a noisy signal.
Eric Enge: Right. So, you would hope that if they tell you directly what their interests were, that would be helpful though. I can see, because most of the people who are making those decisions probably don’t fully understand the impact of what they are doing.
Sep Kamvar: Yes. When I think about what I am interested in, I don’t necessarily think about what I am interested in that I search for and what I am interested in that I don’t search for. That’s something that we found was better learned algorithmically rather than directly.
Eric Enge: Right. So, did you also use preferences that have been set in various Google accounts?
Sep Kamvar: We do use the search preferences, so for example, I can say I prefer results from this language, and so on. That’s been going on for a long time since the early days of Google, so it’s a little bit different then our current focus. We just see this as a way for people to indicate what kind of pages that they are interested in.
Eric Enge: Right. What about if they set in Google Maps their home location for example?
Sep Kamvar: Right. We had started doing that with home location. What we found was that it was a lot better coverage to do it based on IP address, and so that’s what we use at the moment.
Eric Enge: That’s interesting. We’ve done some experimentation with geo-locating things based on IP address here, and it’s got its problems too. When I am sitting in my office, which is in Marlborough Mass, the Geo IP Lookup gives me Boston. That’s a good solid over forty-five minute drive away. Then, there is a coworker of mine who lives in West Newton, which is ten minutes from Boston; and his Geo Lookup says West Newton.
Sep Kamvar: I think you bring up a really good point. The location feature is something that is important to us. As we mature in this particular area, we are working to provide more transparency and more control, so that people would be able to change the location based on Geo IP.
Eric Enge: Right. You can just default to the Geo IP, and then if they specify something you can use that.
Sep Kamvar: Exactly.
Eric Enge: What about external sources of data? Things like the Google Toolbar for example where you can see direct surfing history.
Sep Kamvar: At the beginning of this year, we expanded our search history product to be web history, and with that, we included the option for people to add data from their Google Toolbar. What we found is that data is a little bit noisier than just the search history. It has been more useful in our recommendations product, where we give recommendations independent of search.
Eric Enge: Interesting. Why would it be noisy?
Sep Kamvar: Well, the most important thing here, and this is the same problem with checking off the categories that you are interested in, is that a signal should be very closely aligned with search and what you are searching for in order for it to be useful to personalizing search.
With your search history, there is less data, but it’s all very closely aligned to what you are searching for. Whereas with web history there is more data, but it’s a little bit less closely aligned. There is the trade off there: more data versus less closely aligned to search. And, that’s something that we’ve been playing with and experimenting with in order to see what is the right balance.
Eric Enge: Right. That’s really the starting definition of what make a bad signal versus a good signal.
Sep Kamvar: Yes, exactly. I think that’s exactly the definition of what makes a good signal; how closely aligned it is to search itself. In addition, we’ve found that your more recent searches are much more important than searches from a long time ago. You are lot more likely to search in areas related to your more recent searches than you are to searches say a year ago.
Eric Enge: Right. I suppose there is a couple phenomena there, one is that the searchers are just getting smarter, and more advanced, but also their interests are moving around.
Sep Kamvar: I think that the latter is the primary one. People’s interests move around overtime.
Eric Enge: Right, it makes a lot of sense. Another one of the key issues is the classic disambiguation problem. There is a Miami Dolphins fan and they search for Dolphins, and over and over again they mean the football team Miami, but today they are actually looking for a Sea creature. How do you deal with that?
Sep Kamvar: The primary way we address that is by ensuring diversity in the results. We don’t imagine that we would ever want all top ten of your results to be Miami Dolphins stuff. By ensuring diversity in the result set, we’ll allay that concern of personalization taking over. That’s why a lot of the changes that we make are subtle changes rather than really dramatic changes.
Eric Enge: Right. One way to approach it would be to recognize that Dolphin fits a whole bunch of different categories of answers. You can then allocate a portion of the results to Miami Dolphins, and a part to sea creatures. And then, things get ranked in there, so you may even have a page by some ranking algorithms that’s about the Miami Dolphins that is higher ranking than a page about the mammal, but it doesn’t show up because you have already used your Miami Dolphins quota for the football team results.
Sep Kamvar: That’s one conceivable implementation. Generally, what we do is we try to be a little bit on the subtle and non-aggressive side while we introduce personalization, and personalized results. That generally addresses the issue.
Eric Enge: Right. You don’t go change ten results, you change a couple or something like that.
Sep Kamvar: Exactly.
Eric Enge: Right. So, that way they are still getting good basic core web search results with subtle tweaks to improve their experience.
Sep Kamvar: That’s correct.
Eric Enge: That really relates to my question. If you and I were both to do a search on a query now, we might see just two things or three things that would be different, if anything?
Sep Kamvar: I think there are two parts to that question. The first is aggressiveness, such as how aggressive would we be, and the answer is yes only two results or three results are likely to be affected in each personalized query. I think the second part to that question is how many queries are actually going to be different with personalization.
If you typed in Britney Spears and I typed in Britney Spears, should we really get different results? I think where you see the biggest impact in personalization are on those queries that are underspecified. In particular, a great example of a class of underspecified queries are one word queries. It’s really, really difficult to know what that query means. Take the query “Johnny’s” for example. It’s very difficult to see what the result should be. But, if you know that somebody lives in Omaha, well then Johnny’s Cafe is a little restaurant in Omaha.
With queries that are less specified, the context becomes much more important. Those are the things that are going to be most affected by personalization. Over time we’ve been trained to do pretty specific queries, because if we’ve given underspecified queries, we haven’t gotten the results. With personalization, people are being a lot more comfortable giving shorter and much less specific queries, because they understand that the search engine will pick up the slack.
Eric Enge: Right. So, to put my own phrase to it, crapshoot queries.
Sep Kamvar: Yes, I would say that that’s a reasonable phrase.
Eric Enge: That makes sense, with Britney Spears it’s still somewhat specific, because there is massive data about what that means to people that you can use, so there is not really lot of reason to show people different things.
Sep Kamvar: Right, exactly. Those queries that have multiple meanings are those that are most affected by personalization. Often they are not the classic type of query that you expect, like “jaguar” which is a very clear word that has two distinct meanings. Often times there are queries that probably have a thousand different meanings, and you can’t tell; you can’t even enumerate them. When you look at the query, you just don’t really know what it is they are looking for until you have the context.
Eric Enge: Right, understand. By the way jaguar is also a guitar, so there are three meanings associated with that one. What would you say to the SEOs who constantly worry about their rankings, and how they should think about personalization impacting them. I am assuming we are talking about white hat SEOs here.
Sep Kamvar: Yeah. I think the best thing for SEOs to do is to continue to work on targeting their pages to the user rather than to the keyword. I think that this a great opportunity for SEOs in general, because pages are meant to be read by user at the end of the day.
Personalization really aligns the goals of what’s going to happen at the end of the day, when that user reads the page. How do you work on getting the search results; how do you work on getting indexed by Google relatively well? Those pages that continue to be designed for the user are those that are going to fair well in personalization.
Eric Enge: Right. It’s interesting, we always give people advice that when they think about using keyword tools like Wordtracker and Keyword Discovery, we encourage it, but we encourage it with the context that by the way this is useful data, completely outside of the search context. Knowing the language that people use when they think about and talk about your products is incredibly valuable. You can imagine using such a tool, I don’t know where you would get the data, but, if you had such a tool and search engines weren’t around, it would still be useful.
Sep Kamvar: Yes, that’s true.
Eric Enge: So, does anyone who is not logged in get any personalization?
Sep Kamvar: The location based personalization is available for everybody. For search history based personalization, you need to be logged in, and opted into web history.
Eric Enge: Right. What’s the default; are you default opted in or out?
Sep Kamvar: When you sign up for an account, there is a checkbox that is checked by default, but it’s very prominent in the account setup, so you can uncheck it at the time of account creation.
Eric Enge: Right. Does it confuse things when you have people with multiple accounts?
Sep Kamvar: There is that case, and that tends to be an edge case. But, it all depends on how people use their different accounts. One of the things we’ve found with multiple accounts is that people generally tend to have a primary account that they do most of their activity on. Another case is that they have two accounts that they tend to share about equally, and what happens there is you get a reasonably equal sampling; and so both of those will personalize in the same way.
The third way of using multiple accounts is people will often have one account that they use when they are at work, and another account which they will use when they are at home which is actually great. Because, in the cases where there are different user behavior at work and at home, that captures that, and it also re-ranks the results at the right time.
But, those are edge cases. I mean most people have one account that they use for the majority of time.
Eric Enge: Right. So, ultimately that would be a small percentage of the total number of scenarios that you’d be dealing with.
Sep Kamvar: That’s correct.
Eric Enge: Do you have any sense on the scope of the benefit from the personalization efforts that you can share with me?
Sep Kamvar: Absolutely. Personalization has been a great benefit and both personalized search and iGoogle are key parts of our strategy. We’ll continue to invest in that, because of the gains so far.
Eric Enge: Is there more to come from all of this?
Sep Kamvar: Absolutely. Basically we’ll continue to focus on the fundamentals by continuing to work on the existing signals, doing experiments, and enhancing search results based on that. We also continue to explore new signals, and see what we can get out of adding new signals to the mix. One important thing for us is continuing to provide more transparency and control over personalized search results. I think you’ll see more and more of that as well. So, those are basically the three areas of focus for us in personalized search.
Eric Enge: Right. What do you think in general about the notion that when you have a set of noisy signals that the collection of the set of signals might succeed in making it less noisy when you look at them in the aggregate, and you can use them to balance each other.
Sep Kamvar: I think that’s a really good point. I think this is one of those things where you explore a little bit more as the product becomes a little bit more mature. At the moment, our concern is working and getting juice out of three or four strong signals. Over time, as we get those three or four strong signals, then we can look into the more noisy signals and see how in aggregate they’ll do better.
Eric Enge: It’d be interesting to think about the noisy signals too in the context of dealing with a wide range of edge cases. They might have better applicability there.
Sep Kamvar: That’s correct. I think you’ve made a good point there as well. I think at the moment, because it’s a relatively new product at Google, our focus is on the fundamentals; the three or four signals that are the most strong, and also the main use cases rather than the edge cases.
Eric Enge: Great, thanks Sep!
Sep Kamvar: Yes, thank you Eric!