Garry Wiseman is the group program manager leading the commerce search team within the Live Search division at Microsoft. His responsibilities include overseeing all of the product and classified search related developments, as well as the MSN Shopping and Windows Live Expo consumer destination websites.
Wiseman has held a variety of engineering and management positions during his ten years with Microsoft. Before his current assignment, he was responsible for creating and running the international MSN portal sites and their infrastructure. With Wiseman having overseen the creation of the portals and their content management tools, the sites have grown to more than 40,000 published Web pages in 28 markets covering 13 languages. Today MSN receives more than 500 million visitors worldwide each month.
Before coming to Microsoft, Wiseman worked as a software engineer at several startup companies, including one of the first e-commerce solution providers in the U.K. and an early Internet service provide (ISP).
Eric Enge: Hi Garry, I think the new shopping search is very, very interesting, and offers some compelling new value. I understand it is focused on consumer electronics at the moment. How has the response been to it so far?
Garry Wiseman: It’s been very positive although the features have only really been live to the general public for just over three and a half weeks now. We have been able to pull some preliminary data internally from the log files, how people are reacting to the different answers that we have. So, as you know there are a couple of different types of answers, there is the products category answer, which is a response to a query like “digital camera” and we also have a single product answer where we display different types of data about an individual product in line with the search results. So far the click through has been pretty positive and we’ve had a few ad hoc pieces of customer feedback that they were surprised to see the results directly in line.
Eric Enge: Alright. Is there anything you can share about how much the click through rates increased or any metrics like that?
Garry Wiseman: We don’t tend to normally talk about these sorts of numbers in public. All I can say is it’s definitely doing a lot better than the old shopping instant answers that we previously had there, as far as the click through rates are concerned, so people are definitely more engaged.
Eric Enge: Right, that’s good. How did you come up with the design?
Garry Wiseman: We started off with a set of ideas about the answers and the full page experience of the reviews and tested these. The way we tend to work is that we will create a bunch of different types of layouts and working prototypes and then run them through usability testing where we start with paper based visuals testing, and then we move on to prototype testing in the next run based on what consumers responded most favorably to. You’ll see that in the opinion index page, and the reviews page, that the interface metaphors are fairly common. For example, the UI metaphors, where we have the features with the sentiment ratings on the left hand side, and then the actual reviews on the right hand side. It’s a fairly traditional layout. That seems to resonate most with the users that we run it past.
The method we normally conduct usability testing by is that we’ll have anywhere between 10 members to 12 members of the public of various different technical expertise, so you’ll have the beginners all the way out to people who see themselves as fairly advanced computer users. We then take the feedback and we’ll do multiple runs, but on top of that once we have a set of designs that we are confident in willing to more broadly test, we can do A/B testing. So, as far as the number of people that will see the A/B designs and that we can try different options with, it’s in the thousands.
Eric Enge: Alright. So, in typical A/B testing you try two different scenarios or maybe even multivariate type scenarios. So, if maybe a dozen people see one variant, and a dozen people see a different variant, you see how they respond, and what they like best.
Garry Wiseman: Correct. It helps us decide which is the most effective answer. We will get ad hoc responses from a lot from usability testing, and then this is just more of a statistical method of making sure that we are showing the type of answer or experience that users prefer overall.
Eric Enge: Right, excellent. When you look at the product detail page for example, it appears that the ratings and categories are dynamically generated. Can you talk a bit about how you do that?
Garry Wiseman: Basically, the algorithm knows that each category of item will have a set of features that are important to users. But then, for each different type of item, users may have only commented on certain features in their reviews. For example, if for a certain camera no one has mentioned the zoom quality in any of their reviews, we won’t show that feature rating. But, if a user or multiple users talk about the screen or the display we will pickup on that, and then obviously create a sentiment rating based on that. There is a minimum amount of comments that we need to see in order to generate a rating.
Eric Enge: Right. Did you manually decide on what the categories would be or was that algorithmically generated? To clarify, I understand what you just said is there is a collection of categories that you consider interesting for digital cameras, and then you see which ones each particular digital camera has enough comments on to be able to give a meaningful analysis. Did I get that right?
Garry Wiseman: Yes. So, we know from our internal data and specifications what the key features of a camera or digital cameras are in general. And then, we basically look for those or variations of those features when we are scanning the user reviews for a particular product. When I say variations, we recognize if someone talks about the screen that it’s also the same as when someone else has mentioned the word display. We connect those two together and we can aggregate it under a single feature.
Eric Enge: Right. So, you decided at the start of the exercise, how many categories you are going to look for. How, was that done by humans, that first step or also done algorithmically somehow?
Garry Wiseman: It’s a combination of both. So, there is some machine learning there, but you always start off with an original list essentially.
Eric Enge: Right. So, a little bit of human input is part of the process to guide it, but the bulk of the work is done by the algorithms.
Garry Wiseman: Yes.
Eric Enge: Alright. It appears that what you end up doing is you scan all these reviews, and then for example, you’ve decided that construction is an interesting category. You scan all these reviews for words which are either the word construction or something similar to it. You then find out if you’ve got enough reviews that rate the construction of the camera so that you can give it a rating.
Garry Wiseman: Correct.
Eric Enge: Right. Then, the next step appears to be that you then scan the reviews, and you look for words, positive words and negative words associated with it. So, if someone says the camera is of good construction or, versus someone saying that the construction of this camera is crap. The first comment is a positive vote, and the second one is a negative vote.
Garry Wiseman: Yes. This is one of the key differencing parts of the technology here is that we do “sentiment extraction”. Finding the millions of reviews that we have is one step. The key part is the sentiment extraction in order to work at how positive or negative were people about the different attributes or features of a particular product.
Eric Enge: Right. Basically you have a sentiment extraction engine.
Garry Wiseman: Right.
Eric Enge: This is probably applicable to reviews of any kind of product as long as there is enough review data to feed the engine.
Garry Wiseman: Exactly.
Eric Enge: I was very interested in the whole thing in how it works to, because it does seem like you’ve put some new things out there. Is that your impression as well that you are the first to deploy something like this?
Garry Wiseman: Yes, although the idea of aggregating reviews is something that is not incredibly unusual. There are few sites that do that, but I have never seen any site that actually goes ahead and moves beyond a single rating or overall sentiment extracted review, but that drills down into the individual feature level and shows the users combined sentiments for that feature.
It’s a huge plus for users to have this sort of feature so that they can start doing better comparisons between products. You could end up comparing which MP3 player has the best battery life, if that’s one of the key things that you care about.
If you are not obsessed with the size, or the weight, or anything like that, you can actually start viewing the data across different attributes that previously you might not been able to. Most of the comparison shopping engines are focused on price, or merchant ratings, or popularity when you are looking at products. But, that’s not always how you’d want to search through a set of products.
Eric Enge: Right. It’s a very difficult thing for a user to replicate, because they’d have to read hundreds and hundreds of reviews and take their own detailed notes on everything they are reading everywhere. I imagine that not very many users would sign up to do that.
Garry Wiseman: It’s very time consuming, and it was interesting at Searchification I was talking to journalists and bloggers afterwards. I was worried about explaining the concept, because when you talk about the sentiment extraction feature it sounds pretty complicated. But, I was pleased to hear that the journalists told me afterwards that they got it right away. At that point I felt great, they understood it, and they can see what kind of consumer problem we are trying to address here. We are going to basically save you a whole bunch of time in having to do all that review research. We can help you get to that confident purchase decision mode that you want to be in when you are looking at buying a new product.
Eric Enge: Right. So, it’s a methodology that works really well with products that have a lot of user reviews generated for them. Are there other types of sites that you can also look at potentially in the future to get similar data, or it’s the potential for raw website data to be biased too high?
To clarify, you know you can rely on review sites like dpreview.com for reviews., and you can cover a bunch of things with those sites, such as categories of products that have a high volume of reviews from reliable sites. The question is, is there a way that you can use the sentiment extraction notion by looking beyond review sites to other classes of sites?
Garry Wiseman: We are definitely keeping our options open as far as different types of categories. I am sure over the next six months to twelve months we will start investigating some of the other options as we expand beyond products. At the moment, we are going to focus primarily on making sure that for a product search, we are going to just do the best job we can as far as making sure that opinion index and the reviews are integrated properly. We want to grow the coverage, as well as also the number of reviews, so that we have even more data. But, I think there are definitely other categories you could apply it to.
Eric Enge: Right. So, when I type in digital cameras, I get four Cannon cameras I believe, and how was that chosen?
Garry Wiseman: It’s a popularity score that’s calculated algorithmically. We have different sources of data that we use to rank them. It’s essentially user click through and anonymous user behavior data that we are using there to do the popularity ranking.
Eric Enge: Right, so click through data based on historical data from the search engine.
Garry Wiseman: Yes, and a few other sources that we use.
Eric Enge: Of course you can draw on the MSN website too for example.
Garry Wiseman: Yes.
Eric Enge: Alright. That’s interesting. It must take quite some thought to get that right, because search results are inherently hierarchical. If a particular camera shows up in the first position when someone starts doing specific searches, it of nature is going to get more clicks than cameras further down in the list. So, you’d need to adjust your data to take that into account, right?
Garry Wiseman: Yes, we have weighting, so we will weight certain types of data higher than others as far as calculating the popularity. For example, a direct click on MSN shopping is a pretty good indication that the user had high interest in it.
At least for shopping, it’s a stage where if you’ve looked at a particular product, and then you’ve actually gone ahead and clicked through to a merchant, we can be fairly confident that you’ve gone through the discovery and research phase at this point.
Eric Enge: That makes sense. In general a shopper tends to start with a generic query if they are at early stages in the process, like I type in digital camera or digital cameras just to keep going with that example. How do you see the progression working from a user perspective? They start with their general query, and they get a certain amount of data, then they progress. What are the various stages that you see them going through?
Garry Wiseman: Most people start in the discovery mode which means that they have literally no idea what kind of digital camera they want. Or, they maybe in a research mode where they already know that Canon makes really good cameras.
They may start by doing that generic digital cameras query, or perhaps a brand of digital camera. What we are hoping to do is to display different types of information that we can garner about either the category or a brand of items , and then help users get to that point where they feel confident that they can actually choose a model. Today what we have there is that the most popular cameras popup. We’re also trying to highlight links to guides and reviews, so if we have confidence that we have a selection of guides and reviews which are third party expert review sites around a particular category, we will highlight those in the instant answer as well.
People at consumer reports have great wizards that help you get down to questions like “are you going to go scuba diving with this camera”, and they will ask you those kind of questions and they will end up suggesting a bunch of different types of cameras that are matched to your needs.
Eric Enge: Right, so you start matching up planned usage patterns and needs with particular products.
Garry Wiseman: Correct, and then the other product answers that we have are the product line or product name answers, where you may have already done some research such as a friend may have recommended a particular item to you. You are looking to find out a bit more about this item, and look at the reviews and ratings that people have given it.
I was reading a Nielsen piece of research the other day which was pretty interesting in that they did a survey across 47 different countries asking consumers, what types of forms of advertising do they trust? The top response (78%) was that they’ll trust and put faith in recommendations from other consumers above anything.
That’s why we were confident with our approach, because the reviews are from people who own these items perhaps for several months, and things may have gone wrong, they may have had to go customer support or whatever. This information is better than one-off reviews that you might see, such as an expert review. These are the people who had the original iPods for a certain amount of time and realized that the battery life wasn’t that good. In a nutshell, these are the real life scenarios from people who have owned the items and have seen the ups and downs.
Eric Enge: What about offering some sort of complete list, such as your own guide to digital cameras. You could provide a very simple navigation way to drill down through all of the various digital cameras by category, and figure out whether you prefer Sony, Nikon, or Canon, or whatever, like a directory of all digital cameras.
Garry Wiseman: It’s definitely the kind of direction we’d like to head in. We will continue to work on the coverage, making sure that the answer is first or that you can refine it properly, and that we have some of the other categories covered. We will definitely start extending and expanding our offering to make sure we can get all the web data in one place for consumers to look at, be it research, prices, or specifications and those sorts of things.
We want to make it super easy, and have one place where you can do a query that will have all the information that you need. We want to provide a single point where you can research or compare items that you might to buy.
Eric Enge: Right. What are the other categories of consumer products that you cover? I mean obviously there are digital cameras, but what are other kinds of things that are currently covered?
Garry Wiseman: At the moment we are really focused around consumer electronics, and that includes things like computers, MP3 players, software, cell phones, printers, computer components, etc. We do cover apparel a little bit, so there are some examples that you can find where we’ll have handbags, like Prada handbags or running shoes. We only want to show items that we’re very highly confident of having good relevance and coverage, and if we are not, we won’t show it.
Eric Enge: So I just tried Prada handbags. Actually, it just brought me straight to Sabines Boutique when I clicked on it, so it’s a bit different than the digital camera experience.
Garry Wiseman: That generally means that there on that set of products, there weren’t enough reviews that we could go on from the web at the moment. You will find that for a set of categories where people just aren’t that motivated to leave reviews.
Eric Enge: Right. A lot of times you can get very opinionated stuff from people, in blog posts, for example. I would imagine though that the volume of data is quite a bit smaller than what you can pull from a consumer electronics review site.
Garry Wiseman: Yes. It’s actually fairly common across the industry such as home furnishing, or home and garden or apparel. It’s very hard to get original reviews, because a lot of the items are also unique or small in quantity, and particularly with furnishing, it’s difficult to get people to also respond back, partly because they don’t necessarily buy online, but then also partly because it can be such a unique item. Whereas with electronics there are so many features, and so many things you can comment on.
Eric Enge: Right. Well, what about other areas that you might release in the near future or focusing on in terms of developing.
Garry Wiseman: You will see that from a feature perspective that that we will be adding items here and there over the next few months and through the next six months. In that respect I’ll say “watch this space” as we will continue to enhance what we have there today as this was essentially our V1 release. We are going to keep pushing!
Eric Enge: Right. Any insight as to new categories coming up?
Garry Wiseman: We’ll try and cover the most popular categories and focus heavily on electronics, computers and software. Then, we will try to figure out what we can do from an apparel perspective as that’s also a big category, to see if there is anything else we can provide users with, that would maybe help them research or discover particular items when it comes to apparel.
Eric Enge: Right, yeah. The other thing is that I was curious about is how much of this is really being driven in some unique fashion from Microsoft’s Neural Net Approach, and is that something that would make it harder for competitors to replicate?
Garry Wiseman: This is obviously part of our secret sauce in the sentiment extraction engine, but as far as people replicating the functionality, everything can be cloned. At the end of the day, my main concern is simply that we continue building the best product results, and sets of research tools for the consumers, to make sure that we can answer those research queries that people tend to do a lot of. We are focused primarily on improving our own product, and like I said things will always be cloned.
Eric Enge: Sure. Indeed, it’s just a very interesting when you start to draw advantages from the approach that you took to building search even at the core level, because the Neural Net Approach should give you some advantages in certain classes of analyses, right?
Garry Wiseman: Yes. It has certainly been a real investment to get this up and running. So, it’s definitely not that easy to replicate the feature to match the same kind of scale that we have today. It’s based on a lot of months of research, and algorithms that Microsoft research has been working on and refining for several years. I am sure that will take a reasonable amount of effort for anyone to try and replicate.
Eric Enge: Right. Are there any stones I’ve left unturned here in talking about shopping search with you today, Garry?
Garry Wiseman: No, I was very happy when we had our Searchification event. You seem to understand that product research is such a huge category when it comes to product queries. People actually want to research products when they use the Internet, and you can tell that from the amount of folks that research online, and then purchase offline. That is what they call cross-channel shopping. People still do that today particularly for electronic appliances and other large items that they don’t want to have shipped. For example, with Plasma TVs, people get very nervous about having those shipped naturally, so all they want to do online is research and buy locally.
Eric Enge: Right, you have the corollaries of course too which is people are willing to buy online, but it takes multiple forays onto the web before they do it, maybe one day they do some research, then they think about it a bit.
Maybe they even print out some stuff, and they come back and dig a little deeper. And, it’s that whole classic starting with a very general query, and then finally working down to a very specific query, and finally making a purchase.
Garry Wiseman: Yeah, it’s like a funnel where you start in the discovery phase, then you move down to research, and then eventually go ahead and purchase, and it’s very rare that that all happens in one session.
Eric Enge: Right, indeed. Well great, thanks Garry. I think that was very helpful.
Garry Wiseman: Yes, thank you as well!