Power Adwords Tools with Google’s Frederick Vallaeys

photo of Frederick VallaeysFrederick Vallaeys is a Product Evangelist for Google AdWords. In this role, he helps advertisers learn about which Google products can best solve their marketing needs. He also represents the needs of advertisers with the engineering and product management teams. His main product focus is on ads quality and bulk tools like the AdWords Editor and the AdWords API.

Prior to Google, Frederick was an engineer at Sapient and a part-time wedding photographer who found new customers through AdWords. He joined Google in 2002 to help bring AdWords to the Dutch and Belgian markets. He earned his B.S. degree in electrical engineering from Stanford University in 2000.

Key Points from Interview with Frederick Vallaeys

  1. ValueTrack is the AdWords feature that allows advertisers to tag their URLs with parameters. The resulting URL can then be used within the advertiser’s own tracking systems.
  2. Too many advertisers settle for global level reporting and do not look further. Even if your top level metrics are OK, you can still get great gains in overall campaign performance by digging into more detailed reports.
  3. Segmentation is the biggest power reporting feature that is not used by many advertisers.
  4. Types of segmentation can include times of day, days of week, device type, social signals, and more.
  5. (Fred): “… no matter from which channel the +1 comes in, it all aggregates at the URL level.”
  6. (Fred): “In the social segmentation, you can actually see what the impact is of having each of these different variations.”
  7. You can run multiple segments via the downloadable reports or the API.
  8. (Fred): “… at the end of 2011, half of American consumers had a Smartphone in their pocket.”
  9. Google has a site at howtogomo.com that you can use to see how your site renders on a mobile device.
  10. (Fred): “Google Analytics offers multichannel funnels, and what these allow you to do is see what touch points people have with your online campaigns before a conversion happens.”
  11. (Fred): “One tool that we have is the AdWords Campaign Experiments. That’s a great way for an advertiser to explore how to improve their ROI. They can send 10%, 20%, 30%, whatever percentage they want of their traffic to that experiment.”
  12. (Fred): These (new ad formats) were a big thing for us in 2011, and will continue to be a big thing in 2012.
  13. The Bid Simulator tool will show you what to expect for different types of increases (or decreases) in bids.
  14. The Ad Preview Tool allows you to see whether or not your ads are running. It also allows you to test geotargeting in areas other than your current location, or various types of mobile devices.
  15. Top of Page bid estimates show you what your bid would have to be to show up in the space above the organic results.
  16. Impression share is a way to see what percentage of the time your ads are running. Tuning your campaign to increase impression share can be one of the best ways to get additional traffic.
  17. Google Analytics is planning to expand its social reporting to include more than just the data from Google owned properties – i.e. data such as Facebook Likes.

Full Interview Transcript

Eric Enge: Can you tell me some great power reporting features in the AdWords interface that people rarely use?

Frederick Vallaeys: When you look at AdWords, there are three high-level types of reports that we make available for our customers. You can go into the campaign management interface and pull reports right there in your campaigns. Then, we also have Google Analytics which goes a little bit deeper into some of the data, for example, with real-time reports, social reports and cross-channel reports that look at how ALL your campaigns are contributing to your success and your ROI. The third one is making reporting available for people who prefer using APIs or building their own reporting systems using our URL tagging feature, ValueTrack.

That is a way for us to attach some additional information to each click that comes to your website so that your own reporting software can capture that and then process it. If you look specifically at what is available in the AdWords interface, it’s really gotten very sophisticated in terms of segmentation. And, I think one of the biggest mistakes that advertisers make is they look at their reports at too high a level.

There are probably all of these micro-segments within your campaign where things are performing fantastically well …

There are probably all of these micro-segments within your campaign where things are performing fantastically well, but you don’t know it because you looked at things are an aggregated level. On the flip side you also have elements of your campaign that just aren’t working well. Examples of segments that you could be looking at are the specific time of the day, and the specific day of the week. You may for some reason find people just aren’t buying your product at certain times of day or days of the week.

AdWords Day Parting Report

Eric Enge: What are some of the other segments you offer have?

Frederick Vallaeys: One of my favorite ones is the social segments. When we introduced Google+, part of that was the +1 button, and now the +1 button will show up next to ads. +1s are being collected from the ads, from the organic results, but also from having the +1 button on your own website.

… no matter from which channel the +1 comes in, it all aggregates at the URL level …

Eric Enge: The +1 is associated with a web page, and not the ad or organic results, isn’t that right?

Frederick Vallaeys: Exactly, no matter from which channel the +1 comes in, it all aggregates at the URL level, and will show up in any channel where the consumer is then looking for your business. As an advertiser, when you put the +1 button on your site, you start getting some +1s, and now when somebody is looking for your business or your service, they see your ad as you would have seen it in the past but now there is also a count of +1s next to it.

It might indicate that Eric and five other friends have +1d this page, or, you have the more generic one where it just says 500 people have +1d this. In the social segmentation, you can actually see what the impact is of having each of these different variations.

If you knew that this could drive a lot more clicks, a lot more conversions, then you could make a bit of an effort within your company to get more +1s for those specific URLs …

As an advertiser, what you can start to see is that probably the most powerful results are the ones that have the personal recommendations, and you could start finding some URLs on your sites in your campaigns that you are advertising for, that don’t have a lot of these personal recommendations. If you knew that this could drive a lot more clicks, a lot more conversions, then you could make a bit of an effort within your company to get more +1s for those specific URLs that you have in your ad that need more of the personal +1 recommendations.

Eric Enge: Very cool. What are the sub-flavors that go into the social reports then?

Frederick Vallaeys: When we moved all of reporting into the campaign management interface, the whole notion that we had was to make it really easy for people to immediately act on the information that we show them. Just imagine your traditional campaign management page, it’s down to the ad group level, you can see the results for that. And any time horizon of course that you want as well. Ryan, is there anyway to run multiple segments at the same time?

Ryan:Ryan Voccola: Not in the UI, but you can do it with a downloaded report, or with the API where you would have to add on additional segments that will come out in the CSV export.

Frederick Vallaeys: So you can see how social is effecting your performance, but then correlate that to how certain times of day, or the day of the week, is also impacting your results. That’s where you really get small micro segments where you could start figuring out some pretty interesting things. I think most advertisers will stay at that first level of segmentation, because that’s really going to give you some pretty good returns and that’s also where they have enough data to make statistically sound decisions.

Eric Enge: You can also segment by device, right?

… at the end of 2011, half of American consumers had a Smartphone in their pocket.

Frederick Vallaeys: Yes, that’s a huge one right now because at the end of 2011, half of American consumers had a Smartphone in their pocket. There is a lot of web usage occurring on these mobile devices. One interesting thing we see is that mobile device usage really spikes early in the morning and late at night, so literally the first thing people do in the morning is take out their mobile devices and check their email or research something and it is also the last thing they do before they go to sleep.

As that behavior becomes more common and the usage numbers go up dramatically, it’s really important for an advertiser to look at how they are performing differently on these different devices. Your performance could vary based on whether or not searchers are on a Smartphone, tablet device, desktop or laptop. Perhaps you should make a mobile website to drive up your conversions.

One really cool tool that we launched a little while ago that maybe not many people know about is howtogomo.com. On that site we do an evaluation of people’s websites and how they would render on a mobile device. It’s a really easy for someone to see if somebody came to the site from a mobile phone, would it even make sense to them, would they be able to click the links or are they too small, or how does the page render on these smaller screen devices.

Eric Enge: Have you seen examples of people where there are drastic differences in time of day in terms of conversion rates?

Frederick Vallaeys: That is a little bit industry specific for the most part, but in the travel space in the morning people tend do a lot of research, and then during their lunch break they call their spouse or significant other, and in the afternoon you might see a little bit more booking behavior. Obviously, you do have to be careful with this because those clicks and those visitors that were doing research may be just as important to getting the final conversion.

… a large percentage of conversions involve multiple clicks.

You probably look at Google Analytics to see the multiple steps that happen in your conversions. For the past 10 years, we have been looking at last click conversion in the industry, but a large percentage of conversions involve multiple clicks. It is important to understand the whole cycle for your site.

You might see really generic searches in the beginning, and then as people start to figure out exactly what they want, they get to a very specific search, and maybe the last search they do is a branded search. But all of the searches before that are often really important in convincing that customer that your company is a player in this space, someone they could trust to do business with.

Eric Enge: Can you talk about that a little bit about the problem of attribution?

Frederick Vallaeys: Google Analytics offers multichannel funnels, and what these allow you to do is see what touch points people have with your online campaigns before a conversion happens. Before we had this, we could tell you which keywords assisted in terms of search campaigns. This takes that one step further and tracks display campaigns and social media, so you can follow customers as they go through the funnel of conversion. Maybe you have three touch points through the display network and then you have two different searches happening, and then they bought something.

Conversions in Multiple Touches

Where it becomes challenging is you have to figure out how to assign value to each of these actions, as they are all involved in the conversion. You have to start modeling that for yourself, and you have to experiment with it to see what makes sense.

Eric Enge: For display ads you have this concept of a view through conversion, right?

Frederick Vallaeys: Yes, but what is more powerful here is we can start showing you how your typical person who converts saw your email marketing campaign first, then maybe they saw a tweet, then they saw your display ad three times, and then they did seven searches. You can actually see how all these events contribute to lead to that conversion. Maybe there are 500 people who took a path that was similar to that, and then there are other people who go directly to search because they know exactly what they want.

In the past, if you just looked at last click conversion, you would eliminate these keywords because they had never given you a conversion.

Now you have the data and now you can start figuring out why. If you were to cut out this list of keywords would that have an impact on your campaign? In the past, if you just looked at last click conversion, you would eliminate these keywords because they had never given you a conversion. That could be a big mistake, because maybe that is the keyword everybody always ends up searching, one search before they do the final one that at leads to the conversion. If you got rid of these searches, then people might not even realize your company existed or had this service available, and you wouldn’t get these last click conversion anymore.

Eric Enge: Unfortunately, there really is no science to how you attribute value across multiple clicks or views.

Frederick Vallaeys: Exactly, at some point maybe we will have some more insight into that, but for now the point is to give advertisers the data, and then they can start making decisions off of that.

Eric Enge: Can you talk about the experiments segmentation?

Frederick Vallaeys: One tool that we have is the AdWords Campaign Experiments. That’s a great way for an advertiser to explore how to improve their ROI. They can send 10%, 20%, 30%, whatever percentage they want of their traffic to that experiment. This shows up in your campaign reports, so you can see how the experiment compares to the rest of your campaign. If the experiment is not working well then turn it off and try a different variation.

AdWords Campaign Experiments Setup

Eric Enge: It is an A/B test mode you can setup right within the interface.

Frederick Vallaeys: Exactly, in the past if you wanted to experiment, you would take two weeks of traffic and do one thing and then the next two weeks do something else. But the problem with that is, you are not comparing apples to apples because there might be outside factors during those two different periods that caused the numbers to change. With Campaign Experiments, you can actually split your traffic so all of the experiment is happening at the same time as the control and you get much more reliable data about how your changes impact your ROI.

Eric Enge: What about some of the new ad formats?

… we have seen tremendous success with advertisers who run Sitelinks.

Frederick Vallaeys: These were a big thing for us in 2011, and will continue to be a big thing in 2012. For example, we have seen tremendous success with advertisers who run Sitelinks. These are the additional portal links that you can have in addition to your headline in your ad. In the reports, you can segment on that so you can see how many clicks did you got from headline clicks and how many from your Sitelinks.

Zappos AdWords Sitelinks

This will prove the value for the majority of advertisers and we have seen that these Sitelinks actually do work and have good click through rates and good conversion rates. You can start seeing how much of an impact this is causing and for those campaigns where you’re not using it, how much you are potentially losing as a result.

Eric Enge: Can you talk a bit about the bid simulator?

Frederick Vallaeys: It takes historical auction data and if you have bid x amount of dollars or y amount of dollars, where would you have come out in terms of the typical ad rank and what would that have done for your CTR and the number of clicks that you would’ve gotten. Instead of having to do an experiment and changing your bids around to get that data, we take whatever new number you put in and we run it against the past auction data, and model what would’ve happened in those cases. If you go from bidding a dollar for a click to a dollar fifty, is that going to give you a significant increase in the number of clicks.

AdWords Bid Simulator

What you can figure out from this is your incremental cost per click. Incremental cost per click is a number, by the way, that too few advertisers understand and leverage. And basically, the notion of incremental cost per click is simple. It is the cost of the incremental clicks I get by bidding higher. When you know this number, you can figure if the additional clicks that resulted from an increased bid cost more than what it was worth or does it cost me less. The problem is that most people when they look at an AdWords account only look at the big picture.

If you look at an average, what you are not seeing is how did that increase in my bids change the cost of individual clicks. So on average, you might still be under your desired cost per click to meet your ROI goal, but what you are not seeing in that average is the fact that your last ten clicks, the additional ten clicks that you got by bidding higher, actually cost you $2 per click, higher than the $1.50 average, and maybe $1.50 is the maximum you can afford to spend for a click to still be profitable.

Eric Enge: Can you talk about the Ad Preview Tool?

Frederick Vallaeys: The Ad Preview Tool is lets you find and click on your ads in a test mode without paying for them. For example, you can see if the ad you have for people in Milwaukee is going to the right page, and what would someone from Milwaukee see. You put in your keywords and the location you want to test, you can see if your ad would’ve shown up in that case.

AdWords Ad Preview Tools

Eric Enge: The diagnosis part also allows you to get more visibility into why it is not showing, right?

Frederick Vallaeys: Exactly, so if it is not showing up it will give you some ideas why that might be.

Eric Enge: Can you talk about top of bid page estimates?

What we do now is we also tell you how much you have to bid to show in the paid results above the organic results.

Frederick Vallaeys: In the past, we had first page bid estimates, which tell you how much you need to bid, on average, to be on the first page of search results. That’s the page where most people are going to click on ads, because most people do not go to the second page of results. What we do now is we also tell you how much you have to bid to show in the paid results above the organic results. We also offer segmentation in the reports between top ads and the side of the page ads.

Frederick Vallaeys: I did this in one of my test accounts yesterday and it was amazing. On the right hand side I was seeing a much lower click through rate than on the top of the page. That could be different for other people; but it tells you that this is a lot of potential clicks that I gave up by being on the right hand side as opposed to having bid a little bit more and showing up on the top of the page.

Eric Enge: What about impression share data?

You may find that you can get 30% more traffic just by tuning your bids because you only have 70% impression share.

Frederick Vallaeys: Impression share tells you what percentage of the available impressions your ads are being shown for. It tells you how many clicks you are missing out on by having bids too low, or by not having the right keywords. You may find that you can get 30% more traffic just by tuning your bids because you only have 70% impression share.

Eric Enge: I think few people realize that getting a hundred percent impression share is actually very hard, even for your brand terms. There are cases where people are leaving significant amount of the traffic on the table and that they are busily trying to add new keywords to diminishing returns when there is actually can be 20% and 30% gains by just going through and finding places where they are getting low impression share.

AdWords Impression Share Report

Frederick Vallaeys: That’s a great point. Where you should start is with your exact match impression share, because that is when somebody types in your exact keyword. You probably want to show up on a hundred percent of those. Sometimes your impression share could be lower because you are just not able to afford as high a bid. But even if you are in that situation, maybe it is a great time to go and work on your landing page. Somebody is apparently able to bid higher than you are in those instances and that is probably because they do a better job at converting the customer once they come to that site.

That’s where you can then connect on to Google Analytics and take a look at its flow visualization tools and see if there is some road block somewhere on your site that is causing a huge drop off in terms of conversions. If you can fix these types of things, you may be able to afford to spend more for that click and your 70% impression share goes up to a 100%.

Eric Enge: This is particularly powerful when you start with your high ROI keywords, as it can be easy money. Can you also tell us about the social platform integration in Google Analytics?

Frederick Vallaeys: What people on the web are starting to realize is that a lot of activity around your website, around your content is actually not happening on your own website anymore, and it is happening through social platforms. We are working right now to include some of that data such as likes, and +1s, and thumbs up, and votes and all that stuff that you get on third-party sites and bring it into Google Analytics so then you will have an even better view into how people engage with your brand and your site on the internet today.

Eric Enge: This is an expansion beyond what you talked about before with the social reporting

Frederick Vallaeys: Exactly, it is taking it beyond just the Google properties in these cases, so when it comes to +1s, we have all of that data, we can share it with our advertisers. There are a number of social properties that would be interesting to get some data about how people are interacting with your site and brand. We are building an API so that those other companies can plug into the Google tools and then hopefully they will be able to show the benefit of their platforms to advertisers, because those businesses will start seeing these metrics inside Google Analytics.

Eric Enge: Of the things we have discussed, what are the priorities, where do I start, what do I do first?

Frederick Vallaeys: I would definitely go to all of the segmentations that we have talked about, that is the number 1 thing, just look at those segmentations for your account and start looking for big differences. So, if you see there is a big discrepancy between your mobile performance and your desktop performance or your tablet performance, then that’s a good indicator that you need to focus on that.

Eric Enge: Great! Ryan, any extra thoughts from your side?

Ryan:Ryan Voccola: One minor thing I did want to touch on, we talked briefly about Ad Diagnosis, there is a bulk ad diagnosis feature in the account and that’s under the ‘More Actions’ button which will allow you to bulk diagnose a set of keywords and gain insights without having to go to the ad preview tool.

Eric Enge: Excellent. Thanks Fred and Ryan!

Frederick Vallaeys: Thank you!

Ryan:Ryan Voccola: Yes, thanks Eric!

How Google Does Personalization with Jack Menzel

photo of Jack MenzelJack Menzel is a Product Management Director for Google Search. Jack leads the teams developing new technologies used for personalization, question answering, web page summarization, and image search. Prior to joining Google Jack worked as a Program Manager at Microsoft. Jack holds a MS in Computer Science from the University of Washington as well as an BS in Computer Science and Mathematical Economics from Brown University.

Key Points

One of the hot areas in search is personalization. Google recognizes that personalization is a way to offer people better search results. How this works has a big impact on SEO, and I had the opportunity arise to speak with Jack Menzel and jumped at it. Here are some of the key points from the discussion:

  1. People confuse context with personalization, and these are different things. Context includes factors such as language, location, and time of year.
  2. (Jack:) “A lot of people assume personalization is amazingly pervasive”. In fact only small changes are made to a results page based on personalization. Google recognizes for diverse query results.
  3. Past query history is used for personalization. If you search for “rome”, and then “hotels”, some of the results will be for hotels in Rome.
  4. Past click through history is a factor. If you show a clear preference for one site by clicking on it in the results, then it may be moved up in the results for you.
  5. The recommendations of friends are used in personalization.
  6. Google will look at your friend’s profile to see what networks they have included there, and then see what they recommend on those sites.
  7. (Jack): “When people are signed out, their search results are personalized based on past search information linked to their browser for up to 180 days using an anonymous cookie”.
  8. Appending &pws=0 to the end of a URL does work, but it only removes personalization, it does not remove context (language, location, time of year).
  9. There are ways to turn off all personalized results. Google’s position is that user’s own their data. However, context will still be taken into account.

Interview Transcript

Eric Enge: Sometimes people confuse the notion of context with personalization, right?

If I respond to your query in your language that is really about context, not personalization.

Jack Menzel: That’s right. Sometimes results that are really a result of context get misinterpreted by people as personalization. If I respond to your query in your language that is really about context, not personalization. Personalization is more about recognizing that I like Dominion the card game and you really like Dominion the power company, and someone else really likes a videogame called Dominion. Imagine you turned off personalization, and suddenly Google was responding to all of your queries in the wrong language, you would be like “oh come on”.

Eric Enge: Another example would be that you are in the US and Halloween is in the near future.

Jack Menzel: Correct, right before thanksgiving there are a lot of searches about turkeys, and it often means people want turkey recipes.

Eric Enge: What are some of the other kinds of things that fit into the definition of context?

Jack Menzel: Let’s use a conversation based example. If we are both in Mountain View and I am talking to you about catching a bus, I don’t have to remind you that I am talking about bus in Mountain View, as opposed to one in Austin, Texas.

We take into account geography, language, and seasonality to a certain extent.

We take into account geography, language, and seasonality to a certain extent. The context of the previous queries is kind of on the borderline of what is personal and what isn’t.

Eric Enge: For example, if a person’s previous query was “Rome”, and then they search on “hotel”, there is going to be a tendency to show hotels in Rome.

Search results 3 to 5 for “hotels” when the prior search was for “rome”

A lot of people assume that personalization is still amazingly pervasive.

Jack Menzel: Your example may work, but I would have to check to make sure. A lot of people assume that personalization is still amazingly pervasive. We believe we are able to do some really useful things with personalization, but we may not get all of these things exactly right.

Eric Enge: What are some good examples of personalization that you think are handled well at this point?

We refer to this as “pattern” analysis, and it is based on recognizing preferences.

Jack Menzel: My interest in the card game Dominion is an example of this. I really don’t care about the power company at all. We refer to this as “pattern” analysis, and it is based on recognizing preferences. That’s an example of understanding the kind of topics that I am more interested in. Also, I do a lot of web programming, so when I talk about vectors, it will mean something very different than when a doctor talks about vectors.

Search results for “dominion” for someone with no related search history

We recognize patterns very well. If I keep going to visit my favorite scrabble dictionary over and over again I will see that the site that I tend to prefer will end up being boosted in the ranking because it makes it easier and faster for me. Pattern recognition is important because there is so much ambiguity in language.

Eric Enge: Jaguar, is my favorite example because you have the guitar, the operating system, the animal, and the football team. I would probably get the football team a lot, because I am a football fan.

Jack Menzel: Right, exactly. If you tend to gravitate towards football sites as opposed to operating system sites then you would end up getting that.

Eric Enge: How about social data?

We leverage social data pretty well. If your friend likes a restaurant, they can indicate it in a way that we (Google) can see that (such as a +1).

Jack Menzel: We leverage social data pretty well. If your friend likes a restaurant, they can indicate it in a way that we (Google) can see that (such as a +1). When you’re searching for a restaurant and you’re signed in, we may well boost that restaurant’s site in the rankings for you as well. We will also annotate the results, so that you can clearly see that this is content from your friend.

Search results for “reconsideration requests” with personalization on and off

Eric Enge: How do you determine what social properties people are on?

Jack Menzel: We look at people’s profiles and see what social profiles they have included in there, and we can then see what they share on those sites, provided that the information is public.

Eric Enge: If it’s not connected through your profile and your friend’s profiles then you are not going to use it to personalize results.

Jack Menzel: That’s correct.

Eric Enge: Do you need to be logged in to get personalized results?

We do a certain amount of personalization for people who are not logged in.

Jack Menzel: Being logged in is the best way to get personalized results. We do a certain amount of personalization for people who are not logged in.

Eric Enge: Is that cookie based?

Jack Menzel: Yes, we do some cookie-based personalization, which applies to search sequences where the subsequent searches feel more like a conversation. If you take that away from people it tends to be kind of frustrating. And so, there are certain parts of personalization that we still do.

Eric Enge: What kinds of personalization do you still do when people are logged out?

Jack Menzel: When people are signed out, their search results are personalized based on past search information linked to their browser for up to 180 days using an anonymous cookie. But if you’re signed out, we have much less data to personalize your results with than if you’re signed in.

Eric Enge: For example, you wouldn’t be able to use the social information.

Jack Menzel: That’s right. We have no idea about any of the social information. We only have a very limited knowledge of what your previous actions may have been, but we try to save you from having to repeat every detail in every query. But, it’s not as personalized as a signed in version.

Eric Enge: There aren’t any issues in this approach with shared IP addresses, because you are dealing either with people who are logged in or have a cookie.

Jack Menzel: That’s right. However, you can still run into the problem of shared computers where things get a little muddled. For example, if you are at an internet café and you are just doing a couple of searches to find out where the newest movie can be found. In general though, we don’t tend to have problems at the IP level because the system is based on cookies or being logged in.

Eric Enge: In the case of my machine at home, my 16-year-old daughter can come in and do some searches on it. That’s pretty hard to disambiguate I suspect.

Jack Menzel: That is very hard to do.

Eric Enge: In that environment if she does log me out and log herself in, is there a cookie involved at that point or does it just immediately switch to personalizing for her?

Jack Menzel: Yes. It doesn’t have much do with your cookie. It’s completely associated with your sign in.

Eric Enge: Does appending &pws=0 to the end of a search result URL still turn off personalization as it used to?

Procedure for turning off personalization with &pws=0

It (&pws=0) turns off “personalization”. However it isn’t really useful because people assume that it then will show them what everyone else sees. That simply isn’t the case.

Jack Menzel: Yes it does work. It turns off “personalization”. However it isn’t really useful because people assume that it then will show them what everyone else sees. That simply isn’t the case. There are a whole lot of contextual factors that make everyone’s results most relevant to them. This takes most of the wind out the sails of these types of analysis.

If personalization is turned off, we will still take a lot of context into account, including things such as location, language, and time of year. Of course, you can also get rid of context most of the time by getting more specific about your query. For example, if you live in the US but want to learn about the UK tax code, you would search on something like “UK tax code” to make that clear. Or you can conduct the search at www.google.co.uk instead too.

Eric Enge: When you use search history I assume you need to accumulate a certain enough data to achieve significance involved?

Jack Menzel: Yes, of course. We don’t want you to have done one query out of curiosity, and then suddenly decide that you are really into macramé. We are looking for a meaningful pattern.

Eric Enge: The other area that I think people get concerned about is the potential the loss of serendipity, but it’s not like you remap the entire results page around this.

Jack Menzel: It does make me kind of sad that when we talk about serendipity and search engines that we don’t point out the fact that search engines are the most amazing tool when it comes to discovering new things.

We have lowered the barrier and made it possible to research anything you could possibly imagine in the time it takes for you to type a query and hit enter. A fraction of a second later you’ve got some of the best results in the world for you to dig through. It makes it so easy. If we personalize the results page to the extent that we were only showing results tailored for you, that would be a bug for us. We would never want to do that.

So when we personalize a page the changes are pretty small, and we want to leave the other results untouched by personalization.

We try to give people the most usable page, but on the other hand we also try to give people the most relevant page to them. So when we personalize a page the changes are pretty small, and we want to leave the other results untouched by personalization. Using your interest in football as an example, even though you love football, some of the time you may actually want information on the animal, the guitar, or the OS instead.

Eric Enge: There is this long standing notion that’s been out there called query deserves diversity.

Jack Menzel: That’s right.

Eric Enge: This is obviously something that Google has known for quite some time because it’s many years since I first heard about query deserves diversity. Since you are trying to get as close to satisfying a 100% of the people 100% of the time over-personalizing would fail to do that.

Jack Menzel: That’s right. We really do want to show people a good representation of what the most relevant results would be, and people like that.

Eric Enge: Can you also discuss your approach to transparency and control?

Our position is that this is your data and you have control over your data.

Jack Menzel: We think is really important in any conversation about personalization. Our position is that this is your data and you have control over your data. You do have control over your web history, and you have control over how your browser manages cookies. We take the privacy of people’s data, and how we manage data, and how people have control over that data really seriously.

At the feature-level we try to make it very transparent to people what it is we are doing. We really are trying our best to be the industry leaders in how people have control over the data.

Eric Enge: Are there any aspects of personalization that people can’t turn off?

Jack Menzel: There are ways to turn all aspects of personalization off. If you do want to really not have your queries tracked between, or if you don’t want to have your content tailored to you in any way, shape, or form, you can set your browser to not accept cookies, and then we think you are a brand new person every time. Also, bear in mind that we will still take into account context, such as the right language for your results, your location, and the time of year.

Eric Enge: Thanks Jack!

Other Recent Interviews

Google’s Peter Norvig, October 17, 2011
Google’s Mayuresh Saoji, October 10, 2011
Google’s Frederick Vallaeys, September 29, 2011
Bing’s Ping Jen, September 28, 2011
Bing’s Duane Forrester, September 6, 2011
Danny Sullivan, August 8, 2011
Bruce Clay, August 1, 2011
Google’s Tiffany Oberoi, July 27, 2011
Vanessa Fox, July 12, 2011
Jim Sterne, July 5, 2011
Stephan Spencer, June 20, 2011
SEOmoz’ Rand Fishkin, May 23, 2011
Bing’s Stefan Weitz, May 16, 2011
Bing’s Mikko Ollila, June 27, 2010
Yahoo’s Shashi Seth, June 20, 2010
Google’s Carter Maslan, May 6, 2010
Google’s Frederick Vallaeys, April 27, 2010
Matt Cutts, March 14, 2010

Search Algorithms with Google Director of Research Peter Norvig

photo of Peter NorvigPeter Norvig is a Fellow of the American Association for Artificial Intelligence and the Association for Computing Machinery. At Google Inc he was Director of Search Quality, responsible for the core web search algorithms from 2002-2005, and has been Director of Research from 2005 on.

Previously he was the head of the Computational Sciences Division at NASA Ames Research Center, making him NASA’s senior computer scientist. He received the NASA Exceptional Achievement Award in 2001. He has served as an assistant professor at the University of Southern California and a research faculty member at the University of California at Berkeley Computer Science Department, from which he received a Ph.D. in 1986 and the distinguished alumni award in 2006. He has over fifty publications in Computer Science, concentrating on Artificial Intelligence, Natural Language Processing and Software Engineering, including the books Artificial Intelligence: A Modern Approach (the leading textbook in the field), Paradigms of AI Programming: Case Studies in Common Lisp, Verbmobil: A Translation System for Face-to-Face Dialog, and Intelligent Help Systems for UNIX. He is also the author of the Gettysburg Powerpoint Presentation and the world’s longest palindromic sentence.

Introduction

As you will see in the transcript below, this discussion focused on the use of artificial intelligence algorithms in search. Peter outlines for us the approach used by Google on a number of interesting search problems, and how they view search problems in general. This is fascinating reading for those of you who want to get a deeper understanding of how search is evolving and the technological approaches that are driving it. The types of things that are detailed in this interview include:

  1. The basic approach used to build Google Translate
  2. The process Google uses to test and implement algorithm updates
  3. How voice driven search works
  4. The methodology being used for image recognition
  5. How Google views speed in search
  6. How Google views the goals of search overall

Some of the particularly interesting tidbits include:

  1. Teaching automated translation systems vocabularly and grammar rules is not a viable approach. There are too many exceptions, and language changes and evolved rapidly. Google Translate uses a data driven approach of finding millions of real world translations on the web and learning from them.
  2. Chrome will auto translate foreign language websites for you on the fly (if you want it to).
  3. Google tests tens of thousands of algorithm changes per year, and make one to two actual changes every day
  4. Test is layered, starting with a panel of users comparing current and proposed results, perhaps a spin through the usability lab at Google, and finally with a live test with a small subset of actual Google users.
  5. Google Voice Search relies on 230 billion real world search queries to learn all the different ways that people articulate given words. So people no longer need to train their speech recognition for their own voice, as Google has enough real world examples to make that step unecessary.
  6. Google Image search allows you to drag and drop images onto the search box, and it will try to figure out what it is for you. I show a screen shot of an example of this for you below. I LOVE that feature!
  7. Google is obsessed with speed. As Peter says “you want the answer before you’re done thinking of the question”. Expressed from a productivity perspective, if you don’t have the answer that soon your flow of thought will be interrupted.

Interview Transcript

Eric Enge: Can you outline at a layman’s level the basic approach that was used to allow Google engineers a translation system that handles 58 languages?

Peter Norvig: Sure — Google Translate uses a data-driven, machine learning approach to do automatic translation between languages. We learn from human examples of translation.

Google Translate

To explain what I mean by “data driven,” first I should explain how older machine translation systems worked. Programmers of those systems tried to teach the system vocabulary and grammar rules, like “This is a noun, this is a verb, and here’s how they fit together or conjugate in these two languages.”

Language is so fluid that programmers can’t keep up with the millions of words in all these languages and the billions or trillions of possible combinations, and how they change over time.

But it turns out that approach didn’t really work well. There were two problems. First, the formalisms for writing rules were absolute: this sentence is grammatical, and this other sentence is ungrammatical. But language has shades of gray, not just absolutes. Second, it is true that languages have rules, but it turned out that the rules don’t cover enough — language is more complicated and full of exceptions than people assumed, and is changing all the time. New words like “LOL” or “pwn” or “iPad” appear. Old words combine in unique ways — you can’t know what a “debt ceiling” is just by knowing what “debt” and “ceiling” are. Even core grammatical rules are uncertain — is “they” okay to use as a gender-neutral pronoun? What is the grammatical structure of “the harder they come, the harder they fall,” and what else can you say with that structure? Language is so fluid that programmers can’t keep up with the millions of words in all these languages and the billions or trillions of possible combinations, and how they change over time. And there are too many languages to keep rewriting the rules for how each language translates into each of the other languages.

So the new approach is a data-driven approach. Recognizing that we’ll need lots of examples of how to handle exceptions, we make the leap of saying: what if we could learn everything — the exceptions and the rules — from examples? We program our computers to look on the web for millions of examples of real-world translations, and crunch all that data to find patterns for which phrases translate into which other phrases. We use machine learning to look for recurring patterns — “this phrase in French always seems to translate into this phrase in English, but only when it’s near this word.” It’s analogous to the way you can look over a Chinese menu with English translations — if you see the same character keeps recurring for chicken dishes, you can guess pretty confidently that that character translates to “chicken.”

The basic idea is simple, but the details are complicated. We do some deep work on statistics and machine learning algorithms to be able to make the best use of our examples, and we were able to turn this technology into a world-leading consumer product. Google Research is a great place to come work if you want to tackle these kinds of problems in artificial intelligence.

If you visit a website in Thai or French or Urdu, Chrome will detect it and ask if you want to translate it into your native language.

We’re really pushing to have Translate available as a layer across lots of other products. You can always just go to translate.google.com, but it’s also built into our browser, Chrome. If you visit a website in Thai or French or Urdu, Chrome will detect it and ask if you want to translate it into your native language. It’ll automatically translate the whole page, and keep translating as you click on links. So you’re basically browsing the web in this other language. It’s very Star Trek.

There’s also a cool mobile app you should try — Google Translate for mobile is on Android and iPhone, and it does speech-to-text so you can speak and get translations.

Google Translate for Mobile

Eric Enge: How does Google manage the process of testing and qualifying algorithm updates?

Peter Norvig: Here’s how it works. Our engineers come up with some insight or technique and implement a change to the search ranking algorithm . They hope this will improve search results, but at this point it’s just a hypothesis. So how do we know if it’s a good change? First we have a panel of real users spread around the world try out the change, comparing it side by side against our unchanged algorithm. This is a blind test — they don’t know which is which. They rate the results, and from that we get a rough sense of whether the change is better than the original. If it isn’t, we go back to the drawing board. But if it looks good, we might next take it into our usability lab — a physical room where we can invite people in to try it out in person and give us more detailed feedback. Or we might run it live for a small percentage of actual Google users, and see whether the change is improving things for them. If all those experiments have positive results, we eventually roll out the change for everyone.

We test tens of thousands of hypotheses each year, and make maybe one or two actual changes to the search algorithm per day. That’s a lot of ideas, and a lot of changes. It means the Google you’re using this year is improved quite a bit from the Google of last year, and the Google you’re using now is radically different from anything you used ten years ago.

If you define A.I. as providing a course of action in the face of uncertainty and ambiguity, based on learning from examples, that’s what our search algorithm is all about.

I’d say the resulting technology — Google Search as a whole — is a form of A.I. If you define A.I. as providing a course of action in the face of uncertainty and ambiguity, based on learning from examples, that’s what our search algorithm is all about.

The search engine has to understand what’s out on the web in text and other forms like images, books, videos, and rapidly changing content like news, and how it all fits together. Then it has to try to infer what the user is looking for, sometimes from no more than a keystroke or two. And then it has to weigh hundreds of factors against each other — hundreds of signals, like the links between content, the correlations among phrases, the location of the search, and so on — and provide the user information that’s relevant to their query, with some degree of confidence for each piece. And finally it has to present that information in a coherent, useful way. And it has to be done potentially for each keystroke, since Google results update instantly now.

Every time people ask a question, you need this machine to automatically and instantly provide an answer that helps them out. It’s a deep A.I. problem, and I think we’re doing a good job at it today, but we’ve got lots of room to grow too. Search is far from solved, and we have plenty of room for experts in A.I., statistics, and other fields to jump on board and help us develop the next Google, and the Google after that.

Eric Enge: Voice driven search seems like a very interesting problem to me. Even if you are dealing with only one language you have a vast array of dialects, accents, pronunciations, and ways of phrasing things.

This used to be addressed by having the user “train the system” to their manner of speaking. Are we are the point where we are past that now? What are the basic methods (in layman’s terms) being used to make this possible? Will this expand to automatically transcribing videos?

Peter Norvig: Speech recognition is actually quite analogous to machine translation. In translation we learn from past examples of (English, Foreign) pairs how to translate a new sentence we haven’t seen before; in speech we learn from past examples of (Soundwave, Text) pairs how to find the text in a new soundwave.

So instead of relying on one person talking for a long time to train the system, we rely on lots of people saying lots of things to train the system. So in effect, our users are training the system en masse.

Like you say, in old systems you’d have to sit there and train the thing for an hour before it would recognize your words. We wanted to build something anyone could pick up and just immediately start talking to, and have it understand them right away. So instead of relying on one person talking for a long time to train the system, we rely on lots of people saying lots of things to train the system. So in effect, our users are training the system en masse.

Google Voice Search

I can explain a little more how it actually works. There are three parts to our speech model. First, there’s the acoustic model, which maps out all the possible ways soundwaves can form phonemes, like “ah” or “mm” or “buh.” It’s tricky because acoustics vary a lot by what kind of mic you’re using, what background noise there is, how you’re holding the device, the gender and age of the speaker, and even what sounds come before or after the one you’re making. And like you say, there are lots of versions because accents and dialects vary so much. But with enough examples of speech, we can model what are the most likely ways of forming phonemes.

Then phonemes come together in our lexical model, which is basically a dictionary of how all the words in a language are pronounced. That also takes care of a lot of differences in accents — the model knows that there are multiple ways to pronounce things, and knows which are more or less likely. “Feb-yoo-ary” and “Feb-roo-ary” will both give you “February,” because the model sees both spoken a lot.

Finally, the words are strung together into a language model, which tells you which words are most likely to come after another word. There might be a soundwave that sounds like either “city” or “silly”, but if it follows the words “New York…” then the language model would tell us that “city” is more likely. We have a lot of text to train the system on — for Voice Search, where you speak your search to Google, we train this model on around 230 billion words from real-world search queries.

It’s all anonymized, of course — we don’t keep any training examples that could be tied to an individual speaker; it is all combined into our big model. We do give you the choice to opt in to have us learn from your own voice over time. You can turn this on, and the model will start to learn how your voice varies from our baseline model — say, if you have a strong accent, or a really deep voice. The model works well even without having to train it yourself, but you have the option to make it even better.

You can try this out on an Android phone or on the Google Search app on iPhone or Blackberry. On Android you can search, of course, but you can also compose emails by voice, or for that matter speak into any app where you’d use the keyboard — we added it into the Android keyboard so you can speak pretty much anywhere you might type. It’s also on Google on the desktop if you use Chrome.

Eric Enge: How about the problem of image recognition? For example, can we train a computer to recognize an image of the Taj Majal?

Peter Norvig: Yes. We do this on mobile phones and now on desktop Google Search too. You can actually use it to see where old vacation photos were taken — ones you scanned from back before digital cameras geo-tagged photos. If you took a photo of some cool-looking bridge you don’t recognize, and you can drag and drop the image onto the Google Search box, and there’s a pretty good chance it’ll recognize the bridge, tell you what it is, and give you all kinds of relevant information on it.

Image Search Drag and Drop

As with speech and translation, image recognition is data-driven and relies on machine learning algorithms across lots of examples. Luckily for us, the web has lots and lots of images of things, and most of them have captions that identify them. The more popular, the more images, so the better a chance we have at our algorithms being able to recognize it.

Here’s how image recognition works in a nutshell. It starts with identifying points of interest in an image — the points, lines, and patterns that provide sharp contrasts or really stick out from a bland, featureless background. It’s similar in some ways to how the human eye picks out edges and points by keying off the places where there’s sharp contrast.

Then it looks at how these points are related to each other — the geometry of the whole set of points. You could picture it as looking like a constellation of stars, even though really it’s a more sophisticated mathematical model of these points of interest and how they relate.

Now it compares that model to all the other models in a huge database. Those other models come from images it has already analyzed from around the web. It looks for a matching model, but it doesn’t have to be a perfect match. In fact, it’s important that it be a bit flexible, so it doesn’t matter if it’s turned around, or shrunken, or twisted a bit. The Taj Mahal still has the basic geometry of the Taj Mahal even if you photograph it from a little bit of a different angle or photograph it lower in the frame. When Google recognizes that it matches that model best, it guesses it’s probably the Taj Mahal.

There’s something profound here about asking a “question” that’s actually just an image. We’ve moved beyond every query being a string of text. Now you can just present Google an image and expect relevant information.

There’s something profound here about asking a “question” that’s actually just an image. We’ve moved beyond every query being a string of text. Now you can just present Google an image and expect relevant information. So it puts even more burden on the search engine to know what that’s supposed to mean. What’s the best answer to a question when the question is an image? We present some information we think is relevant today, but what exactly the interaction should be here is still ripe for research.

Eric Enge: For some of these tasks we currently must rely on batch processing instead of real time processing (e.g. the way that the Panda algorithm currently operates). How long before the processing power increases to the point where the Panda algorithm can be done in real time?

Peter Norvig: I wouldn’t separate out that one update from the rest of the Search algorithm that way; it was really just one improvement among many that we’ve made in the past year or so. But the question is certainly relevant to our Search algorithm overall.

Broadly speaking, you can think of the growth of the web and the growth of the computing power needed to instantly index it as a kind of arms race.

Broadly speaking, you can think of the growth of the web and the growth of the computing power needed to instantly index it as a kind of arms race. The web keeps growing. There’s a misperception that the web has become established or matured, but in fact the growth curve is a nice smooth exponential that hasn’t shown signs of slowing down yet. We’re still in the middle of the information explosion.

So we keep up with it a few ways. It helps that processors and disks keep getting cheaper. Even new categories of technology, like solid-state disks, have helped. We’re also getting smarter about delineating which content needs to be updated instantly, and which can be updated more slowly — again, we learn how to do this from examples. A lot of the smarts you see in Google Instant, and the predictive input suggestions that keeps guessing what word you might type next, are about anticipating what information is most likely to be needed, and queuing that up so it’s ready to go.

Google Code Articles of Speeding Up the Web

We’re really obsessed with speed at Google. Speed is a crucial feature of any information-intensive product. You never want your tools to slow you down or interrupt your flow of thought. There’s a cool feature we launched a little while ago called Instant Pages which takes Google Instant a step further: instead of just predicting what words you might type, and pre-loading the search results, if Google is really confident that the first result is the right one, it’ll start loading it in the background. So often by the time you click that result, it’s already loaded up — so the website appears to load instantly. It’s like a magic trick when it works well.

Eric Enge: Can you expound a little bit on the types of problems that AI can work on solving in the area of search over the next 5 years?

So you want the answer almost before you’re done thinking of the question. We think we can offer that now most of the time.

Peter Norvig: We’ll work more on speed. It used to be that a few seconds was really fast to learn what the height of the Eiffel Tower was — that’s a heck of a lot faster than a trip to the library to look it up in a reference book in the back shelves. But now even a few seconds feels slow, because again, it interrupts your flow of thought. So you want the answer almost before you’re done thinking of the question. We think we can offer that now most of the time.

But eventually this will stop being such a back-and-forth question-and-answer routine, and start just being a steady flow of relevant information. It should be right there when you need it, presented so it’s useful without being overwhelming. It’s going to take a lot of engineering and a really fine artistic touch to make that work the way we envision it.

And of course this gets to a deeper A.I. problem: not just understanding information and queries, but really understanding what the user needs and will find useful at a given moment, and serving it up in a way that’s perfectly digestible. It’s not just about human-computer interaction or information retrieval. It’s about how people learn and attain knowledge. We’re trying to move beyond just presenting information, and really focus on increasing people’s knowledge of the world. So Google needs to be “smart” in the sense of really understanding the user’s needs in order to help them build up their knowledge of the world.

Eric Enge: Thanks Peter!

Other Recent Interviews

Google’s Mayuresh Saoji, October 10, 2011
Google’s Frederick Vallaeys, September 29, 2011
Bing’s Ping Jen, September 28, 2011
Bing’s Duane Forrester, September 6, 2011
Danny Sullivan, August 8, 2011
Bruce Clay, August 1, 2011
Google’s Tiffany Oberoi, July 27, 2011
Vanessa Fox, July 12, 2011
Jim Sterne, July 5, 2011
Stephan Spencer, June 20, 2011
SEOmoz’ Rand Fishkin, May 23, 2011
Bing’s Stefan Weitz, May 16, 2011
Bing’s Mikko Ollila, June 27, 2010
Yahoo’s Shashi Seth, June 20, 2010
Google’s Carter Maslan, May 6, 2010
Google’s Frederick Vallaeys, April 27, 2010
Matt Cutts, March 14, 2010

Starting Up with Google Product Search, with Google’s Mayuresh Saoji

photo of Mayuresh Saoji

Mayuresh Saoji is a Senior Product Manager on the Google Commerce team. In this role, Mayuresh is responsible for leading efforts on Merchant Center, Content API and broad Google Product Search Policy issues. Previously, Mayuresh was a Product Manager on the Google Chrome team, and also lead the Distribution efforts for products like ChromeOS, Google Toolbar, iGoogle and Chrome browser. Prior to Google, Mayuresh was a Product Manager at Microsoft where he worked on Go-to-Market for Sharepoint 2007. Mayuresh holds a Bachelor’s degree in engineering from the University of Bombay, India and an MBA from the Kellogg graduate school of Management.

Key Points

Google product search offers a rich array of opportunities for publishers to place their products in front of shoppers (there is a bulleted list of the opportunities right at the start of the interview). Mayuresh does a great job of spelling out the way to get started with Google Commerce in this interview. If you sell physical products this interview can act as a guide on how to get started and how to prioritize your efforts from an optimization perspective. Here are the key points:

  1. You must sell physical products online to participate.
  2. One opportunity is to place Google Commerce Search on your site. This provides visitors a way to search your product catalog using Google’s search technology. It is a paid product.
  3. The first step is to create a Merchant Center account.
  4. The second step is to verify that you are the owner of the website.
  5. The next step is to provide a data feed of all your products.
  6. Implement a test feed before going live, as this will allow you to find and remove errors upfront.
  7. The most important optimization step is good quality data. This is worth a lot of effort, as Google will lose faith in feeds that show errors.
  8. Make absolutely sure that the pricing data is accurate.
  9. Plan on having a ISBN code, UPC code, or EAN code (Europe) for all your products.
  10. Have images for all of your products. (Mayuresh): “it’s to your benfit to send uys good images for every product”.
  11. Update your feed (Mayruesh): “at least as often as your website is updated”.
  12. The Content API is useful for large feeds where it may be desirable to make partial updates (e.g. change only the price for 200 products). However, you need programming expertise to use it.
  13. (Mayuresh): “Product reviews are important, and they provide a good signal to users about products”.

Interview transcript

Eric Enge: What are the benefits of participating in product search?

Mayuresh Saoji: Any merchant that sells physical products online is a good candidate for participating in Google product search. Participating in product search provides you with a forum for sending structured data on your products to Google. It allows merchants to show more rich data in many formats:

  • On Google.com
  • Google Shopping
  • Google Product Search
  • Product Ads and Product Extensions
  • Google Shopper in Mobile Search

If they are Google Commerce Search customers, which is a paid product, then that same data is leveraged to power the search and discovery experience on their e-commerce website or mobile application.

The end goal is to drive a lot of qualified traffic to publishers, and that’s the best reason for doing this.

Eric Enge: Basically, it is like a Custom Search Engine, but for products?

Google Commerce Search (GCS) is an e-commerce search solution designed specifically with online and multi-channel retailers in mind.

Mayuresh Saoji: It has some general similarities, but Google Commerce Search (GCS) is an e-commerce search solution designed specifically with online and multi-channel retailers in mind. GCS has several advanced features besides product recommendations to help retailers improve their conversion rates.

Eric Enge: Great, what’s the best way for someone to get started?

Mayuresh Saoji: First you create a Merchant Center account. This is where you tell us about your business, your store, and provide us with your URL. The second step is to verify that you are the owner of your website. This is still part of the signup flow, and once that’s done then now you have a valid Merchant Center account. That’s one part of the story.

The other part of the story is to start submitting your data to us. Google has published a product feed specification, and you need to adhere to that specification, and then you can submit data in one of a number of formats to us. You can submit it as a tab delimited (TSV) file, a flat file, XML file, or via the Content API. Many of our larger retailers use the Content API, and that allows them to easily submit hundreds of thousands of items (and much more), and also makes it easy to make very quick changes to specific attributes of those items.

Google Commerce TSV file

I’d also recommend creating a Test Feed file first and submitting test data.

I’d also recommend creating a Test Feed file first and submitting test data. This functionality can be found under the “Data Feed” tab in the Merchant Center (click on “New Test Data Feed”). The test feed is not indexed and displayed on Product Search, so it’s a perfectly safe environment. We also have great error reporting for the test feed, which will allow merchants to understand errors, iterate and quickly get a functional feed up and running.

Once you are done testing you can actually submit the data and we will ingest it, index it, and then show it on Google Product Search and some of these other properties.

So to summarize: Create your account, verify your website, create and submit a Test Feed, work out all the kinks, and then submit the actual data feed to us.

The Google Merchant Center is the hub for these interactions: It’s where you provide us information about your business, it’s where you submit your product data feed. It is also the place where you can go to see the status of your data, to see if there are any errors with your submission. We also provide you with reporting on clicks, etc. so you can see how your product listings are performing.

Eric Enge: I assume you need programming expertise to use the Content API?

Mayuresh Saoji: Yes. You do need programming experience because you have to make HTTP calls with the right parameters. Most of our merchants submit data to us in an XML file or a flat file today. The content API is used by some of our largest merchants, who have that in-house IT expertise, It’s also used by merchants who need to change their data quickly, and frequently

Eric Enge: What determines the order in which you show products?

Make sure you adhere to the feed spec and make sure you fix problems as we report them in the Merchant Center.

Mayuresh Saoji: There are some things that you can control, and the biggest thing is to give us good quality data. Make sure you adhere to the feed spec and make sure you fix problems as we report them in the Merchant Center. In the Merchant Center there is a data quality tab. For instance, if you submit 10,000 items and 300 of them don’t have images the Merchant Center will tell you that.

It gives you very concrete and specific feedback on the types of errors, and in many instances also provides actionable feedback on what you can do to fix those errors. Note that submitting a feed is sometimes an iterative process. You may have some errors at first, but the Test Feed can make it easy to figure out problems and get it right quickly, so I highly recommend using that tool from the Merchant Center

Eric Enge: What are the best ways to optimize your feed?

Mayuresh Saoji: There are a few best practices to keep in mind. Whenever possible, each product should have a unique ID (there are rare exceptions for custom or one-off products). This is an important attribute that we look at, for matching products on the backend. This could be a UPC code, it could be an ISBN number for a book, or an EAN code if you are in Europe. Fundamentally it’s the unique fingerprint for each product.

Make sure that your price and availability information is accurate.

Make sure that your price and availability information is accurate. For price, you should separate out tax and shipping. If you tell us an item costs $12.99 make sure that it is actually $12.99 on your website and not $13.99. A mismatch in price is a bad experience for the user, moreover, the clicks you get are not going to convert to a sale on your site because you have a different price advertised. This generally leads to a bad taste in the mouth for everyone.

We provide you with a mechanism for giving us the base price, giving us the tax, and giving us the shipping separately, and we also show those separately on the search results page.

The other key thing would be images. In a nutshell, good quality images provide clear information to the consumer. So, it’s to your benefit to send us good images for every product that you sell. Note that each visually distinct variant does need its own image.

We also recently introduced some new attributes for better categorization of your items. Make sure that you send us that category code, and this is especially important for things like apparel and accessories like shoes, and jewelry.

I would summarize this by saying the top things merchants would care about would be unique IDs, price, availability, tax and shipping, and images. In addition, for Apparel and variants of products, there are some very specific requirements that are extremely important … you should read our feed spec for more details

Eric Enge: Would items with different colors still need a separate UPC code or EAN code?

Mayuresh Saoji: In many cases they do have a separate UPC code or an EAN code, and in some cases they don’t. It depends on the product actually.

Q: How often should the feed be updated? A: At least as often as your website is updated.

Eric Enge: How often should the feed be updated?

Mayuresh Saoji: At least as often as your website is updated. It’s important to keep the data fresh. Keeping your data fresh is very important. Many merchants set this up such that they have an automated process which will just go into the backend and send us a new feed every night. Some people send it to us multiple times a day because that’s how often their website varies. For some products, pricing can vary, and more importantly availability can vary from hour-to-hour.

Many people use the Content API for these kinds of scenarios because unlike the feed spec, the Content API allows you to make very, very quick changes and incremental changes to price and availability for specific products without reloading the whole feed.

Eric Enge: The Content API gives you a lot less latency in terms of turning that around.

Mayuresh Saoji: Absolutely. It also gives you a lot more control in being able to change certain specific attributes for certain specific products.

Eric Enge: What are the advantages of the Content API?

… with the content API you can submit only the parameters that are changing (in this instance, Price) for each of those fifty products.

Mayuresh Saoji: With a flat file feed you have to give us all the attributes for every item you send. If you submit a thousand items total in your feed, and subsequently you need to update the price for fifty of them, you can submit a flat file with only those fifty items. However, you’ll need to submit each and every attribute for each of those fifty items. And then it will overwrite the whole thing, but with the content API you can submit only the parameters that are changing (in this instance, Price) for each of those fifty products.

Eric Enge: What role do product reviews play?

Mayuresh Saoji: Product reviews are important, and they provide a good signal to users about products, so this is something that merchants should encourage their shoppers to do and should provide this information.

Reviews in Google search results

Eric Enge: Can you talk a bit about the changes you announced on September 2nd and the changes to the product search feed specification you announced in July?

Mayuresh Saoji: We want to get to a richer, more visual shopping experience, and we want to ensure that shoppers are getting the rich and detailed information they are looking for. I think this has benefits for everyone. It’s good for our merchants, because we can deliver more valuable, more qualified traffic to them.

Reviews in Google search results

In order to support this goal, we needed to get better (and higher quality) data from merchants. This was the impetus behind the new feed spec requirements we announced in July 2011.

  • For instance, we’ve required that merchants submit a high-quality image for all of their products. We’ve given merchants the ability to have alternate views of those products as well (although alternate views are not a required attribute).
  • We’ve gotten much stricter and more prescriptive about availability and how to define it.
  • We added the Google product category attribute, which allows us to better categorize and classify products using a standardized taxonomy. It also allows us to make sure we apply the right set of rules for certain products.

There were also a bunch of requirements around apparel that we had announced. There is this concept of variance in apparel. Typically the variant attributes are colors, size, material, and pattern. We’ve specified those things. One big change we’ve made is we’ve asked merchants to submit one distinct item per variant. So, if you have a shirt sold in three colors and two sizes, you would need to send us six separate items. That’s a high-level of the changes that we’ve made to the spec.

Regarding the updates to the Google product search page we announced in September, the goal here was to help shoppers find new stuff. Again, it’s all about that richer experience. Our merchants have a better showcase for their products. It’s like walking into the mall and touching and feeling something. You want to get as close to that as possible

We wanted to make it easier for our users to browse and discover new products, be aware of trends, etc. If you look at the new product search homepage you will see many interesting changes. We’ve got more of a curated feel to the page now. We show popular products, we showcase new trends, we may show relevant Google offers. It’s very fashion-focused and apparel-focused at this point.

It provides a more visual way to shop for dresses. We simplified the UI and removed much of the text around the images. We’ve increased the size of each image; we’ve emphasized the visual aspects of apparel shopping. People often shop by color or genre or size or silhouette of a dress, and we’ve taken those things into account.

In addition, from each product page you can see visually similar products. Shoppers will have the ability to view similar items, and there is the serendipity that takes over and allows them to very quickly browse, and meander, and discover. The goal for us was to help shoppers browse and discover new products and new trends in a fun and visually appealing environment.

Eric Enge: Thanks Mayuresh!

Other Recent Interviews

Google’s Frederick Vallaeys, September 29, 2011
Bing’s Ping Jen, September 28, 2011
Bing’s Duane Forrester, September 6, 2011
Danny Sullivan, August 8, 2011
Bruce Clay, August 1, 2011
Google’s Tiffany Oberoi, July 27, 2011
Vanessa Fox, July 12, 2011
Jim Sterne, July 5, 2011
Stephan Spencer, June 20, 2011
SEOmoz’ Rand Fishkin, May 23, 2011
Bing’s Stefan Weitz, May 16, 2011
Bing’s Mikko Ollila, June 27, 2010
Yahoo’s Shashi Seth, June 20, 2010
Google’s Carter Maslan, May 6, 2010
Google’s Frederick Vallaeys, April 27, 2010
Matt Cutts, March 14, 2010

Real Time Quality Score Defined, with Google’s Frederick Vallaeys

photo of Frederick VallaeysFrederick Vallaeys is a Product Evangelist for Google AdWords. In this role, he helps advertisers learn which Google products can best solve their marketing needs. He also represents the needs of advertisers with the engineering and product management teams. His main product focus is on ads quality and bulk tools like the AdWords Editor and the AdWords API.

Prior to Google, Frederick was an engineer at Sapient and a part-time wedding photographer who found new customers through AdWords. He joined Google in 2002 to help bring AdWords to the Dutch and Belgian markets. He earned his B.S. degree in electrical engineering from Stanford University in 2000.

Key Points

Hoo boy! I went through this interview to try and extract the most important points made, and I will do the best I can here. However, if you are a serious AdWords professional, I’d suggest you read the entire interview from end to end.

The main thing you will get from this interview is that the Quality Score you see in your Google AdWords account differs significantly from the Real Time Quality Score that Google uses to determine how your ad ranks. There is definitely a strong correlation, so Quality Score is a useful metric, but an understanding of Real Time Quality Score can give you an extra edge in understanding what it is you need to do to make your optimization efforts as successful as possible.

Quality Score is the number you see in your Google AdWords account. It is a number between 1 and 10, where 1 is a horrible score, and 10 is an awesome score. Some key points about Quality Score are:

  1. It is mostly based on historical clickthrough rates of the keyword and ad text.
  2. Additional factors include landing page quality and load time of the page, but these are secondary factors.
  3. Quality Score (QS) is based on data from exact match only. Even if you bid on a broad match keyword, such as “cruises”, only exact matches with the keyword are used to determine the QS.
  4. The published number is the aggregate for all instances of that keyword in your account.
  5. When you first add keywords into an new account, Google will show the system wide average for that keyword as your Quality Score.
  6. If you have an existing account, and you add a new keyword, than the account history is a factor in the default Quality Score.

Real Time Quality Score is the number used by Google to help determine your ad rank. It has a lot in common with QS, but is calculated in real time and takes into account many additional factors. Some key points about Real Time Quality Score (RTQS) include:

  1. Specific query performance is taking in to account. For example, if you bid on “tennis shoes” and someone searches on “discount tennis shoes”, but you sell only expensive tennis shoes, chances are that the resulting user interactions will end up in a low RTQS for this particular query.
  2. RTQS is personalized to the user based on query history. For example, a recent search on “Rome” followed by a search on “hotels” is more likely to show adds for hotels in Rome.
  3. RTQS personalization is session based. Once the session cookie is deleted the query history used for personalization is lost.
  4. Other personalization factors include location and time of day.
  5. The +1 button does not factor into RTQS … yet. However, it can impact QS and RTQS by increasing Clickthrough rate.
  6. +1 is associated with the URL, regardless of whether or not it is clicked on in the ad, organic results, or on the web page.
  7. Site links drive CTR increases ranking from 17% to 30% and can also result in more qualified customers (higher conversion).
  8. CTR expectations are normalized by position. So if the number 1 position usually gets a 30% CTR and you are getting 20% that is a negative.
  9. RTQS is determined at the keyword-ad level. There are no ad group or campaign components to RTQS.

That’s it for the summary points. However, in the body of the interview there is much more, including Frederick’s recommended process for optimizing your QS and RTQS, lots of examples, and why bidding your keywords high when you first launch them is a smart thing to do.

Full Interview Transcript

Google AdWords Eric Enge: Can you tell me how Quality Score is used?

Frederick Vallaeys: The Quality Score is Google’s way of ensuring that we show the most relevant ads to our users, and we deliver high quality leads to advertisers buying the clicks from us. The Quality Score obviously factors into the ad rank together with the advertiser’s bid.

It helps determine which advertiser has the highest position on that page. The Quality Score that you see in the account is determined by a number of factors and is mostly based on the historical click through rates of the keyword and the ad text.

The Quality Score is only based on data from results on exact match.

The Quality Score is only based on data from results on exact match. That means the keyword the user types in has to be exactly the same as the keyword chosen by the advertiser. There has to be an exact match between those two regardless of which match type the advertiser selected. Also, we only use data from google.com, not display network traffic or traffic from our search partners.

That’s the data that builds up the Quality Score. We also have additional factors such as landing page quality and load time of the page, but those are secondary factors. The biggest thing we look at is the historical click through rates of the ad text with the keywords inside of the account.

Eric Enge: That’s specific to what we see published in AdWords, is that correct?

Quality Score Frederick Vallaeys: Exactly. What you see published in AdWords is going to be a number between one and ten. A Quality Score of one out of ten is a terrible Quality Score, and a score of ten is a fantastic Quality Score. What you have to keep in mind is that the number we publish is the aggregate for that specific keyword. It reflects all the data we have on that keyword for your account.

The key point here is that this is an average, and an average is never great which is why we also calculate a Real Time Quality Score internally.

The key point here is that this is an average, and an average is never great which is why we also calculate a Real Time Quality Score internally. The average you see in the accounts is good for figuring out where you have an issue.

As an advertiser, if I have to prioritize which keywords to optimize, this is a good indication. Any Quality Score below a seven is a place where you might want to start looking. The lower that number the bigger an issue you have.

Eric Enge: When you open up a new account, and there isn’t any click through rate history, I’ve seen situations where the Quality Score is quite low but the numbers come up as the account ages.

Frederick Vallaeys: Right. What typically happens when you start up a new account, or you put a new keyword for the first time into an existing account, is we take a system-wide average based on advertisers who have run on that keyword in the past. What often happens is that the keyword may be fairly broad and may not be the best performing keyword system-wide.

As your account ages and you start getting impressions and clicks on that keyword, we can build a specific picture of how you, an advertiser with those specific ad texts, will do on that keyword. If you are a good advertiser that knows how to write a compelling ad text for all the keywords, your Quality Score will certainly increase at that point and become much better. It also becomes your own Quality Score as opposed to that starting point system-wide average.

Eric Enge: Can keywords with a bad history have a negative impact on another keyword’s quality score?

If an account has a set of keywords that in aggregate have a low QS, this can have a negative impact. Zero impression keywords do NOT matter because those contribute no CTR data.

Frederick Vallaeys: In the absence of specific data about how a keyword performs with a specific ad, we rely on system wide data and account-level data. If an account has a set of keywords that in aggregate have a low QS, this can have a negative impact. Zero impression keywords do NOT matter because those contribute no CTR data.

Keywords with few impressions and few clicks could in aggregate have a large number of impressions with a low CTR and this could hurt the account. Keep in mind though that even if there is a negative impact on the account, this won’t matter as soon as we have enough data about how a keyword performs with a specific ad because we’d use that specific data for QS rather than the less specific account level data.

Real Time Quality Score

Eric Enge: Let’s say we have a keyword such as “tennis shoes.” How is Real Time Quality Score, both displayed and calculated?

Tennis Shoes

Frederick Vallaeys: Many people will type in “tennis shoes” but others may type in variations of that keyword such as “discount tennis shoes” or “Nike tennis shoes.” If you had that keyword in the broad match then your ad would have been eligible to show on these different variations.

For the Real Time Quality Score we calculate at the exact moment a user did the search and take into account what these variations are. If you sell expensive tennis shoes, and someone did a query for discount tennis shoes, we would show your ad and maybe that ad had an eight out of ten Quality Score. It’s a mismatch to what that specific user was looking for because they weren’t looking for expensive tennis shoes. In that case it would not be the best ad to show.

The real time system allows us, based on the additional data for this specific situation, to know this ad is not the best ad for that case, and to give preference to some of the other ads.

We think it’s a real positive for advertisers, because in the past we would aggregate and you would get clicks that maybe weren’t from the most qualified potential customers because we were looking at averages. Now we can look at how they formulate the query and how that impacts their likeliness of being interested in this advertiser’s ads.

Instead of a eight out of ten, the Real Time Quality Score might be a five out of ten telling us this ad is not a great ad for this query. This will affect the ad rank and, in some cases, the ad doesn’t show.

In the “tennis shoes” situation, when someone types in “discount tennis shoes” we are looking beyond exact match and you have a separate Real Time Quality Score calculated for the performance of the query “discount tennis shoes” against that keyword, that ad and that landing page. We could look at some interesting cases that would match really ambiguous keywords which are difficult to bid on.

If you as an advertiser pick that relatively generic keyword, we can find a subset of queries that do well for what it is you are selling.

For another example, consider the keyword “jobs.” You could be looking for Steve Jobs or you could be looking for jobs in San Francisco. How do we know? If you as an advertiser pick that relatively generic keyword, we can find a subset of queries that do well for what it is you are selling, whether it’s a blog about Steve Jobs’ company or whether it’s a blog or website for finding a job in San Francisco. Many years back, the AdWords system wasn’t quite as specific with its Quality Score. What it would do in these ambiguous cases is not run the advertiser’s ads because we would say, “okay, on average this is a pretty bad keyword, it doesn’t perform that well.” We would lose sight of the specific queries in which it actually did do well.

With the more sophisticated system we have today, if there is a small subset of queries that work well for you, we can find those and often show you in quite a high position even though all the other queries for that same keyword might not have done well for you.

Cruise Ship Another example I like to use is “discount cruises.” If someone looks for discount cruises, it’s not ambiguous in terms of what they are looking for, but it could be ambiguous in terms of the destination they are looking for.

There are companies that offer Alaskan cruises and companies that offer Caribbean cruises. Generally, people are more interested in the Caribbean or warm weather cruises. With that generic keyword “discount cruises” you might do well on most queries because most people want to buy your Caribbean cruise.

In those few instances where someone is looking for an Alaskan cruise, it would be a poor decision to show your ad because you don’t sell that cruise. If we had gone on the average, we would have shown the ad because most people look for Caribbean cruises.

This provides a better user experience because users aren’t seeing an ad for Caribbean cruises just because it happens to have a high overall Quality Score.

With the real time system we see that the user typed in the word “Alaskan” in addition to “discount cruises.” This is probably not the best time to show the ad, and it prevents the advertiser from showing an ad that’s unlikely to lead to a sale. This provides a better user experience because users aren’t seeing an ad for Caribbean cruises just because it happens to have a high overall Quality Score.

Personalization and selection of Ads

Eric Enge: In the scenario above, where the user provides more information based on adding a qualifying word to the query. For discount Alaskan cruises you don’t show the Florida or Caribbean cruises ad. Could you look at the user’s past query history and see that they recently read blogs about Alaska or things of that kind? Is there anything like that in play at this point?

There is a personalization factor in place. This works by looking at previous queries the user has done … when we talk about personalization it’s actually on an anonymous basis.

Frederick Vallaeys: Yes. There is a personalization factor in place. This works by looking at previous queries the user has done. A good example of this is a user came to Google, did a search for Rome, and the next search they did was for hotels. What Google knows is that they probably were thinking about hotels in Rome as opposed to hotels anywhere. Rather than show generic ads for hotels, we can look back at that session data and show more relevant ads based on that. That’s the extent of what we can do at this point.

I would like to note that when we talk about personalization it’s actually on a anonymous basis. It means we know what a certain cookie is doing, but we don’t know what a certain person is doing. We know that cookie ID 1234 searched for Rome before they searched for hotels, but we don’t know that the cookie is Frederick Vallaeys.

Eric Enge: You obviously have to avoid the privacy concerns. Does the cookie that allowed you to do this survive across its sessions?

Frederick Vallaeys: No. We found that’s usually not a great thing to do because the correlations you start seeing actually go down quite a bit. Also, we don’t always combine the previous searches to the current searches because if there is a clear shift in topic that the user is searching for then it doesn’t make sense to look at that previous data.

Eric Enge: This personalization that we spoke about is a factor in Real Time Quality Score?

Frederick Vallaeys: The other mechanics we look at are the location of the searcher and the time and day. There are a number of other factors we don’t disclose, but we do evaluate many factors that could potentially have some impact. We look at CTR, and if there a strong correlation between this factor and CTR, that’s a factor we could continue to use. Location and time are two good examples that do matter.

Eric Enge: If it’s November and somebody in Massachusetts typed in “discount cruises” are you more likely to show a Florida cruises ad than an Alaska cruises ad?

Frederick Vallaeys: Exactly. We might give preference to an ad on Florida and Caribbean cruises for people from a cold location.

Eric Enge: Correspondingly, if you have someone in California typing that, you might actually show a Hawaii cruise ad rather than a Caribbean cruise ad.

Frederick Vallaeys: Exactly.

Eric Enge: What are some of the correlations for time of day?

there have been a number of studies in the travel industry that show in the morning people tend to research hotels they may stay at. At lunch they talk to their spouses to get approval to book a certain hotel. In the afternoon they may be more likely to book that hotel.

Frederick Vallaeys: You can think about differences in behavior even if they were searching for the same thing at different times of day. For example, there have been a number of studies in the travel industry that show in the morning people tend to research hotels they may stay at. At lunch they talk to their spouses to get approval to book a certain hotel. In the afternoon they may be more likely to book that hotel.

So, if we find a query in the morning for a certain type of item, we might give preference to more research-oriented ads, and in the afternoon we may focus on more transaction oriented ads. That’s difficult so the system depends on having enough statistically significant data to make those decisions.

Eric Enge: Right, because you don’t know if they went and talked to their spouse, but you do know they tended to click on review-oriented ads as opposed to book-it-now oriented ads.

Frederick Vallaeys: Exactly.

The role that the +1 button plays into the Quality Score

Eric Enge: What about the +1 button that you now see appearing on ads. Is that something you factor into a Quality Score at this point?

Frederick Vallaeys: It doesn’t factor into the ranking yet. However, what we typically see whenever a new ad format or a new feature of an ad is introduced, such as the +1 button, is that it sometimes increases click through rates. If the click through rate increases, that leads to a better Quality Score so there is definitely an indirect factor by having strong +1 recommendations and endorsements that more people could click on your ad.

+1 is essentially bringing social to the moment of relevance.

+1 is essentially bringing social to the moment of relevance. If a user sees that five of his buddies have booked the same vacation or done business with the same cruise line that’s a pretty strong endorsement and that user is more likely to also click on the ad, check it out, and buy from them. If you as an advertiser can build that following of +1 clicks and get people to endorse you that should be positive for you. If that seems to be a useful thing to use in terms of Quality Score, we absolutely could start thinking about integrating that.

Eric Enge: Are people clicking on those +1 buttons in the ads in any volume? I could see +1′ing a great article, but I’m not sure what the proclivity would be of people to +1 an ad.

Frederick Vallaeys: That raises another good point which is if you are using +1 as a publisher, an advertiser, or a business the +1 actually is associated to a certain URL. So, even if you don’t have a +1 next to your ad, but you get people to +1 your website, that all feeds into the same pool of data.

Later on when somebody searches and sees your ad, those recommendations will show up even if those +1′s were done from your website or the organic results. It’s a whole ecosystem that persists across all the different touch points you might have with that customer, whether it be through Google or through your own website.

As far as the volume of how many people have done this, I can’t talk about that. It’s still early stages for this, but we are pleased with the way people are using it at this point.

The power of using the new ad extensions

Eric Enge: One of our clients is using the seller rating ad extensions. That’s kind of a corollary, this whole business of including reviews and ratings into the whole process.

Frederick Vallaeys: Exactly. I think it fits into the bigger picture of new ad formats you see on Google, and it stems from the fact that we realize that sometimes the picture is worth a thousand words, and the ad doesn’t have to be purely text.

You can also answer with a map. If it’s a local search you can enhance with product prices and images if it was a product search. If it was a search for a new movie then it might make sense to show the trailer right there. Positive seller ratings and reviews are a good thing to surface because it helps build trust and brings in those clicks that an advertiser was looking for.

We’ve seen site links drive increases in CTR anywhere between 17% to 30%.

A specific example to look at is site links, which is probably the easiest of the new ad formats to implement because it’s literally going into your campaign and putting in up to ten links associated to each of your campaigns. We’ve seen these drive increases in CTR anywhere between 17% (search) to 30% (mobile). These are fantastic increases in CTR simply by showing more information that’s useful to users.

Eric Enge: Similar to the +1 button, it’s something the eye notices and attracts a little bit of mind space.

Frederick Vallaeys: Exactly. We want to be careful because people are drawn to new things, but we need to make sure that those new things are not just drawing clicks because they are different, but because they are actually useful. We are careful in terms of launching these new features and testing them and making sure there is actual user benefit in them.

On the flipside, when the user sees more it typically also means they are better qualified by the time they make the click and come to you as an advertiser, so you are more likely to convert that customer. A great example of this is again in the travel space.

Let’s say someone is looking for a destination and you have a travel site with car rentals, hotels, flights, and vacation packages. In the past you would have taken that user to your generic page where they could have done all four of those things. But, if you now show four site links to each of those different areas of your site, you’ve done two things.

You’ve told that user “hey, by the way you might not have realized it, but we also do car rentals.” The second thing is the user goes directly to that page for the thing they were looking for. Now you can take them to a page where, instead of cluttering it with the things they weren’t looking for, you actually put special offers and pitch the product they were looking for.

In the case of car rentals you show them what discounts are available in the space that you might have otherwise had to use to say “hey, you can also book flights here” which they weren’t looking to do at the time. It’s a positive thing for both the user and the advertiser.

Apple Search Result

Eric Enge: I saw what Apple did with site links. They show their current hot offers. It’s a very, very smart way to use that feature.

Quality Score and Position Normalization

Clickthrough RateEric Enge: Coming back to Quality Score and the click through rates. I assume you have some way of adjusting expectations based on positions, because obviously one would expect the first ad to get the most clicks. To put a strawman concept out there, if we thought the first ad was going to get 30% of the page search clicks, and the second was going to get 15% and so forth then if the first ad gets 25% and the second ad gets 20% then that starts to be a sign that the second ad is the better ad. Am I interpreting that correctly?

Position normalization says that we have different expectations for CTR for the different ad positions.

Frederick Vallaeys: Yes, you are spot on with that. We call it Position Normalization, and it’s exactly as you described. Having a certain CTR, say 25%, could be a really good thing if we were expecting you to get 15% in the position that you were in. Your Quality Score could go up. Many advertisers look at the CTR in their accounts and try to judge everything on that. However, it’s important to look at both the CTR number as well as the Quality Score number in your account.

Eric Enge: You want to look at them together as it’s a relative thing.

Frederick Vallaeys: Exactly. You look at them in combination, and the more important thing to look at in your account is the return on investments you’ve received from those ads. The Quality Score is a number we put in there to help you figure out where it is you could perform better and possibly decrease your cost and increase your position by having more relevance. If that is driving ROI, then that’s the only thing that matters to advertisers.

Eric Enge: You don’t want to lose sight of the end goal. The Quality Score is basically a tool to help you better get to that goal. The point you just made about the Position Normalization, is that you get to look at all the things together. I need to look at it in a holistic fashion so it can tell me where the opportunities are.

Frederick Vallaeys: Exactly, and a simple technique is to look at which of your keywords have a sub-bar Quality Score; and that could be any number. That could be the lowest ones in your accounts or it could be literally at a one level or a two level. Then you can look at your search query report.

From that you start seeing these different variations, and now you can start figuring out why is it that it wasn’t performing well at the aggregate level, and then how can I make my account more specific by building out new ad groups for these different search queries that we are also triggering.

Typically, when you do that, you increase your relevance because you are now taking more specific keywords and building ad text specifically for those which help you boost up your click through rate.

Tips for optimizing your AdWords account

Eric Enge: If you are a publisher that wants to do optimization on your account, what are the steps you recommend publishers should go through?

Frederick Vallaeys: I recommend that you look holistically at your accounts. Sort it on a keyword basis from lowest to highest Quality Score and apply some filter so you are not looking at anything that doesn’t have a lot of impressions yet.

Look at which ones have the highest volume and not a great Quality Score.

I would say a thousand impressions and up. That’s the baseline where you would start looking at it, and then do a secondary sort on that. Look at which ones have the highest volume and not a great Quality Score. Go after the high volume first even if it’s not necessarily the absolute lowest Quality Score, but it’s still in that bucket where the Quality Score is not quite where you want it to be, and start optimizing on those.

Then try to figure out if you could write better ad text for that keyword as it stands now or do you need to break that keyword into more specific variations, build new ad groups around that to create ad text that’s more compelling and maybe lead it to a landing page that’s also more specific.

Eric Enge: Is there an ad group or campaign level component to Quality Score?

Frederick Vallaeys: The QS is at a keyword-ad level. So the way you structure ad groups plays a large role in determining QS. However there is no ad group or campaign QS component. I.e. if you took the same keyword and ad and moved it to a different ad group or campaign, the quality score would remain the same.

Eric Enge: We did talk about Position Normalization earlier, but is there an argument in some situations for bidding higher? To drive history faster, or do things to try to help the Quality Score go up?

I think you hit the nail on the head with the statement that it (bidding higher) helps you build history faster in some cases.

Frederick Vallaeys: I think you hit the nail on the head with the statement that it helps you build history faster in some cases. Keep in mind when you bid higher it usually means you are going to get a higher position on the page.

In those higher positions if you go from being on page two to page one, that’s going to have a huge impact on how quickly you accrue impressions. It’s those impressions that will give Google the confidence to make a Quality Score judgment that’s specific to your account as opposed to the system-wide averages.

If you, as an advertiser, are doing much better than the system-wide average then it would benefit you to prove to us as quickly as you can because that will then decrease your costs in the long run.

It’s about building that volume, but not about anything else because there is Position Normalization. Bidding up to a higher position and getting that higher CTR isn’t a guarantee of getting a better Quality Score in the long run.

Eric Enge: Right, because presumably the Position Normalization is adjusted on a keyword basis. Position normalization for market expectations on one keyword might be different than the expectations on another keyword.

Frederick Vallaeys: Right.

Eric Enge: That eliminates any possibility that you could fool the Position Normalization algorithm with the bids. The only thing you gain is that you can accelerate the development of your own history.

Frederick Vallaeys: Exactly.

How the real time math helps advertisers

Eric Enge: In summary, the Quality Score we see in AdWords is actually a very valuable proxy basically for the real numbers because you can’t possibly handle the data for the real numbers as a human.

Frederick Vallaeys: Exactly. That brings up another good point. One thing I like to harp on is that Google has a lot of data, and we are very good at using that data to give the best results to advertisers. Conversion optimizer is actually a good example of this.

To the point that you just made, we at Google collect data on a query-by-query basis, can have an expectation of how that’s going to perform. The problem is that even if you had that as an advertiser, there would be no way for you to bid in real time based on those factors.

That’s where Google can actually do a good job for those advertisers, and that’s where conversion optimizer comes into play. That’s using all of Google’s power of crunching numbers to make sure that you are meeting your ROI targets, and let us handle all the heavy lifting of determining the right CPC.

Eric Enge: Thanks Fred!

Other Recent Interviews

Bing’s Ping Jen, September 28, 2011
Bing’s Duane Forrester, September 6, 2011
Danny Sullivan, August 8, 2011
Bruce Clay, August 1, 2011
Google’s Tiffany Oberoi, July 27, 2011
Mona Elesseily, July 18, 2011
Vanessa Fox, July 12, 2011
Jim Sterne, July 5, 2011
Stephan Spencer, June 20, 2011
SEO by the Sea’s Bill Slawski, June 7, 2011
Elastic Path’s Linda Bustos, June 1, 2011
SEOmoz’ Rand Fishkin, May 23, 2011
Bing’s Stefan Weitz, May 16, 2011
Bing’s Mikko Ollila, June 27, 2010
Yahoo’s Shashi Seth, June 20, 2010
Google’s Carter Maslan, May 6, 2010
Google’s Frederick Vallaeys, April 27, 2010
Matt Cutts, March 14, 2010

Danny Sullivan on Google+, Facebook, Twitter, Social and Search

Published: August 8, 2011

photo of Danny Sullivan

Widely considered a leading “search engine guru,” Danny Sullivan has been helping webmasters, marketers and everyday web users understand how search engines work for 15 years. Danny’s expertise about search engines is often sought by the media, and he has been quoted in places like The Wall St. Journal, USA Today, The Los Angeles Times, Forbes, The New Yorker and Newsweek and ABC’s Nightline.

Danny began covering search engines in late 1995, when he undertook a study of how they indexed web pages. The results were published online as “A Webmaster’s Guide To Search Engines,” a pioneering effort to answer the many questions site designers and Internet publicists had about search engines.

Danny currently heads up Search Engine Land, which covers search marketing and search engine news. He produces the SMX: Search Marketing Expo conference series, writes a personal blog called Daggle (and maintains his disclosures page there). He can be found on Facebook, Google+ and microblogs on Twitter as @dannysullivan.

Key Interview Points

In this interview with Danny, we delve deeply into Google+, Facebook, Twitter, Social signals in search, and the value of a holistic approach to Internet marketing. There was just too much in this one for me to summarize everything in the key interview points below, so consider these a bit of a teaser for the good stuff you will find within this discussion. As always, many of the key points listed here are summaries of the conversation rather than quotes.

  1. Google+ offers really good support for threaded discussions.
  2. (Danny) “It’s disappointing that there is no way to allow the Hangout to be open for public viewing and no ability to record, maybe these things will come.”
  3. Google Buzz was crippled by automatically incorporating Gmail contacts. Google+ did not make this mistake.
  4. (Danny) “It’s easy to get lost in Google+, Twitter is more efficient with your time.”
  5. (Danny) “One of the exhausting things on Google+ is it’s easy to get lost reading through comments to see what people are saying and how they are reacting.”
  6. (Danny) “… even though the links themselves might be no-follow, they may still calculate they were a shared link and that might be giving you a signal.”
  7. (Danny) “Absolutely, I think it (Google+) has a chance of being significant in terms of Bing significant or in terms of being much closer to a rival. It’s similar to what’s going on with Bing and Google.”
  8. When you see a picture of a person to the right of a SERP, it is a result of the rel=author tag.
  9. (Danny) “I am finding (Google+) Circles to be exhausting. I created all these circles, and there is the decision: what Circle do I put them in?”
  10. (Danny) “When I look at the numbers, I find Bing seems to be gaining from Yahoo more than anything else.”
  11. (Danny) “Potentially, Google+ is a threat to Twitter, but there is a lot to be said for Twitter’s simplicity.”
  12. (Danny) “Many people say there is an exact formula to what you should do on Twitter but there is no one right formula.”
  13. (Danny) “The brands get excited about the customer service role (for Twitter), but I think they ought to be handling customer service through their regular channels so people don’t feel they have to yell out on Twitter.”
  14. (Danny) “Search has his cousin called discovery … (and) social is very strong at providing that.”

Overall impressions of Google+

Google+ Eric Enge: Other than the horrible name Google+, which you are on record stating your thoughts about, what do you think?

Most impressive is the amount of commenting a post can generate in Google+.

Danny Sullivan: I’ve been impressed with a lot of it. Most impressive is the amount of commenting a post can generate. It’s somewhat phenomenal that you can put something out and suddenly five, ten, twenty people jump in on it.

Compared to Twitter, you make a remark and you may get two or three tweets. Sometimes it might catch fire but it’s not consolidated in one place. Google+ generates a lot of activity and I wonder what will happen if more people get involved.

Eric Enge: I agree with you that the threading is much better. I think that’s a big plus.

Danny Sullivan: I think the commenting has been useful to see. For example, I have a large collection of people that I follow and joke that it is like going from middle school to high school. You recognize many of your friends but there are many new people that you haven’t seen before.

If you start over with the social network, you may get exposed to people that you hadn’t thought of before and you hadn’t connected with. It is a time consuming and painful process doing that over again.

Eric Enge: What about the Hangouts, have you done anything with that?

It’s disappointing that there is no way to allow the Hangout to be open for public viewing and no ability to record, maybe these things will come.

Danny Sullivan: I jumped into one when I saw Bradley Horowitz on. It was interesting because he said a couple of things that were news without having to schedule an interview. They get busy quickly so you have to move fast.

One of the disappointing things is that they don’t allow you to let the Hangout be open for other people to view. Also, there is no ability to record. I think both would be useful. Maybe these things will come.

Eric Enge: Obviously, it’s the first thing out of the gate. Now the question is, “are they going to follow-up on it?” This was missing from some of the other social initiatives put out there by Google.

Danny Sullivan: When you compare it to Buzz, there are a few crippling things that hit Buzz. First, Buzz rolled out in a way that made people feel it violated their privacy.

Eric Enge: Right, by automatically incorporating Gmail contacts.

Danny Sullivan: Yes, exactly. You had this thing thrust upon you, which you didn’t necessarily ask for, and that gave it a bad taste from the beginning. Then they allowed the ability to pull an RSS feed in fairly quickly. That may have been a mistake because I piped in my Twitter feed and a number of people said “whatever I am doing on Twitter, or another service, I could just pipe it into here.”

This turned Buzz into what FriendFeed was. It was nice if you wanted to see what everybody is saying, or you had friends across different social services. It didn’t necessarily inspire people to think “I should be doing original content here,” and it never seemed to take off.

They don’t have the RSS import now, so if you want to post on Google+ you have to go there and come up with something that you want to put there. It’s driving you back to the site each time.

Eric Enge: Are you thinking of the audiences differently? What do you post to Google+ versus Twitter?

It’s easy to get lost in Google+, Twitter is more efficient with your time.

Danny Sullivan: A little bit. I will post more “+” things to Google+ than to Twitter because it makes sense when to go there. On Twitter I might express a small gripe, such as my computer crashed, because that is more of a Twitter type thing. I’ve shared some things across both, but Google+ made it easier for me to share multiple pictures because of the way the Android App works. You can do it with Twitter but it’s harder and it depends on which app you are using. Some allow multiple pictures, some don’t.

There is a concern that some people may follow you in both places, and you don’t want them to see the same stuff. I am not worried about it and actually started posting out onto Facebook more than usual this week because I thought I should be doing stuff there. However, I wonder if I am going to have time.

One of the exhausting things on Google+ is it’s easy to get lost reading through comments to see what people are saying and how they are reacting. I feel Twitter is much more efficient because with one click I can see if anybody has sent me a reply. Even though Google has notifications that work well, it’s still easy to get locked into reading a discussion about what everybody is saying. This is useful in a many ways but also very time consuming.

(Note: After Danny and I had this discussion, he wrote a post calling Google out for the way it has handled brands on Google+).

Google+ as a ranking signal

Ranking Signals Eric Enge: What about Google+ as a ranking signal?

Danny Sullivan: I think they are using +1 for social search, but they haven’t said they’ve integrated it as a general ranking signal. I certainly think that will come.

It’s getting very confusing about what they use or what they say they don’t use. That’s why I wrote the article “What social signals do Bing and Google really count” last December. (Note: Since we did this interview we have gotten first confirmation that Google+ is influencing Google rankings).

I was trying to get them to be very clear about what they did. They said “well, we are using some limited things here, we are using some limited things there,” and we discovered for the first time that your Twitter links actually were follow links. They weren’t no-follow because they got a fire hose of data from Twitter via the API. Therefore, all those links actually carry credit because the fire hose didn’t have no-follow attached to it. Who knew?

Even though the Twitter links themselves might be no-follow, they may still calculate that they were a shared link and that might be giving you a signal.

Now the Google – Twitter fire hose deal has ended. So, supposedly, all those links on Twitter don’t count anymore, but Google told me recently we can still count up all the links and try to figure out how much something is being shared. So, potentially, if one of their ranking signals is how much something is being shared then, even though the links themselves might be no-follow, they may still calculate they were a shared link and that might be giving you a signal.

On the one hand, it’s maddening if you are trying to figure it out. On the flip side, I think it’s foolish to get that specific about it. When we did our Periodic Table of SEO Ranking Factors, one of the things I put on there was social as a factor.

You can get lost trying to decide whether or not you think that is Facebook share counting or not share counting and does this tweet count. The best way to look at it is ask yourself, “what are the major social networks, are you active on them and do you have a good reputation on them.”

Even if it doesn’t help you with search down the line, it’s probably a good traffic generator to do. That’s what link building was all about. Link building was an activity you did independently of search, and then the search engines began valuing links.

Even if that link you got never paid off for you in terms of search credit, it potentially paid off for you in terms of traffic. So, sometimes I think we can get too involved around search when we should be looking at the bigger picture.

Eric Enge: I think the search engine is trying to use as many signals as they can to get good quality data. Part of it is to make it more obscure, to make it harder for people to spend so much time figuring out exactly how it works so it can be gamed, and get people to focus on producing good stuff and promoting it.

Think of all the energy that goes into whether the tweet is close to this word or not, and the impact of that. All the energy you put into that could probably have gotten your link tweeted twenty more times.

Does Google+ have a chance?

Facebook versus Google Eric Enge: John Batelle recently put out a post that said he thinks Google+ is a legitimate threat to Facebook, but Facebook is still the one to beat, which is obviously true. Do you think Google+ has a chance? Not on beating Facebook necessarily, but of being significant?

Danny Sullivan: Absolutely, I think it has a chance of being significant in terms of Bing significant or in terms of being much closer to a rival. It’s similar to what’s going on with Bing and Google.

Bing has a significant search engine that people should consider and it has a lot to offer. However, it is far from being the market leader. It continues to play catch up and is well behind Google. It may never become the #1 choice simply because many people like Google.

With the Google-Facebook situation, a lot of people are happy with what they have on Facebook. They don’t particularly think Facebook needs to change, their friends are there and they are having their set. I think it’s a tough challenge for Google to try to unseat it. However, I think it has a chance to build itself up as a strong alternative.

The rel=author tag

Danny Sullivan rel author Eric Enge: What is driving the pictures of people showing up in the Google SERPs, over on the right?

Danny Sullivan: That’s the Google rel=author tag that was rolled out. This is where you’ve identified yourself as an author. It’s a beta program, and I don’t think it’s happening automatically for everyone. I know they put me into it so when you search you see my face staring out of you because I’ve got my author tag setup. That should be pictures of people who are authors.

Social Networks are here to stay

Social Networking Eric Enge: Data came out in May indicating a decline for Facebook in the US and Canada and getting beyond 50% penetration was an obstacle. I look at my high school kids, and all the high school kids I know, and the penetration is a 100%.

Every kid has Facebook accounts that they use for their basic mode of communication. Do you think this little perturbation of dropping briefly is something that’s going to disappear overtime?

Danny Sullivan: Social networks are a digital expansion of ourselves. If you look at search, it is a digital way for us to do what we always did, which was ask questions, just more efficiently. I think social is the same kind of thing. It has digitized us. It has allowed us to connect.

We are not going to put that genie back in the bottle. Personally, I love that I can connect with people I don’t know and am friends with them. People have had bulletin boards for ages so I don’t think that’s going to go away.

Eric Enge: How far do you think this penetration will go? I argue the penetration is vectoring towards 100%, or very close, over a period of decades.

Danny Sullivan: I don’t know that you ever get to 100% of anything, but I could see us having 90% or more people. The penetration is high, if you want to count email as a social network. It’s odd that we don’t think more about email. People kind of despise it, but if I was going to do a civil war documentary reading emails from soldiers, you probably would find it very touching.

Eric Enge: We’ve got email but kids don’t use email, unless they are forced to.

Danny Sullivan: Zuckerberg said when it came to Facebook messages they don’t need email. They have other ways of keeping in touch. I would argue that what they are doing is still email.

Someone might say, “I am giving up email for a month and won’t miss it.” If you still communicate with people through direct messages, Facebook messages, and Google+, you didn’t necessarily give up email the way you thought you would.

You’ve given up traditional email but not given up the concept of communicating with people digitally. It’s changing, and we may not have traditional email accounts. People will have other ways of connecting.

What will Facebook’s reaction be to Google+?

Facebook Eric Enge: Do you think Facebook will adapt Google+ features, like a better way to do Groups or Hangout?

Danny Sullivan: Last week Zuckerberg basically said not to expect any changes quickly, and he tried to downplay it. He said when it came to Lists, only 5% of the people or Groups actually made use of them at Facebook. He doesn’t think that’s something many people want to use.

I think he is probably right. I wrote an article yesterday on how I am finding Circles to be exhausting. I created all these Circles, and there is the decision: what circle do I put them in?

I will end up with what I already have which is a small group of things that are for my family, maybe a group of things that are for real friends, and then everything else. It will be interesting to see what numbers we get from Google in terms of how much private sharing is going on.

Eric Enge: In Google’s introductory video for circles, they highlighted the indecision aspect. You see them dragging a face of a prospect around; Acquaintance? Friend? I am not sure how well having you spend more time making a decision will do as a feature.

When I first saw circles I didn’t think that’s the killer product.

Danny Sullivan: I am curious to see how it goes, but when I first saw circles I didn’t think that’s the killer product. I think some people will find it appealing that Google+ may allow them to reset or restart their social network. When I started with Facebook I accepted everybody as a friend. Now it’s not worth the time to drop a thousand people because I might want to share something more personal on Facebook.

It is much easier for me to say everything I do on Facebook is public. Other people may not feel the same way. They may want a venue to share privately, so maybe Google+ will resonate with them.

As for Hangouts, Facebook said people tend to do one-on-one communication. It is difficult for me to tell how much Hangout will turn into a compelling reason to be using Google+. Young kids may think it’s cool. If all their friends are going to be there, and they hang out with them on a regular basis, then it becomes much more compelling.

That could be the thing that brings people over from Facebook. If that happens, I think Facebook would quickly ramp up and come out with its own feature It’s much easier to list all the things Facebook has that Google+ doesn’t have including Like buttons, that let you Like things into your stream, brands that do not have to pretend not to be brands or hope that you don’t kill them.

The Bing-Facebook deal

Eric Enge: Do you think the Bing-Facebook deal is a significant advantage for Bing? Is it something that can help them make progress?

Danny Sullivan: Potentially. The big advantage they have is automatically personalizing your results. If those personal results are better, and your friends are showing up on it, you may like it more. That’s something Google can’t do. Even though they ramped it up big in the last two months, it hasn’t been that dramatic of a change.

Eric Enge: In Compete I saw numbers that said Bing has 14% and Yahoo has 16%, but website stats I look at do not show the cumulative total of Bing and Yahoo near 30%.

When I look at the numbers, I find Bing seems to be gaining from Yahoo more than anything else.

Danny Sullivan: When I look at the numbers, I find Bing seems to be gaining from Yahoo more than anything else. This is what I expected when Yahoo got out of the search game, which they won’t say they got out of search but to me they did.

Does Google+ Threaten Twitter?

Twitter Eric Enge: It seems to me that the Google+ stream is more of a threat to Twitter then to Facebook.

Danny Sullivan: Potentially. For me it’s sitting in the middle ground. On Google+ I can write a bit longer post if I want to. I also find it a little easier to share some photos. One of the things I find remarkable is that I check in on Google+. This is something I would never do on Twitter because it doesn’t allow you to check in. Nonetheless, when I see people check in using Foresquare and send it to Twitter, I get annoyed because it feels unnatural.

Google+ encourages me to check in as part of the native settings, and I enjoy doing it. No one has complained about it. It’s been the opposite with people commenting their interest, so I’ve made an effort to share something and check in.

Potentially, Google+ is a threat to Twitter, but there is a lot to be said for Twitter’s simplicity.

Potentially, Google+ is a threat to Twitter, but there is a lot to be said for Twitter’s simplicity. The little short bursts make it easy to digest. It makes it easy to dive in and dive out without having to spend a huge amount of time. This is perhaps a disadvantage for Twitter in trying to make money, but it is a much bigger advantage if you are relatively short on time. It is something Twitter needs to pay attention to, but it’s difficult for them to change because they would have to change the core part of what Twitter is.

Eric Enge: It took me a while to understand that the compelling feature of Twitter was the ability to communicate with a relative lack of commitment to the communication. Twitter allows us to throw something out there, a simple brief comment, and then move on.

Danny Sullivan: Right.

Eric Enge: It seems a large percentage of Twitter messages are personal. I looked at seventeen of your posts and twelve of them were of a personal nature. People are showing more about themselves and seeming more human. It plays a role in building trust and relationships. Google+ for example, is much centered on topical communications. Of course, that could change over time.

Danny Sullivan: It’s really hard to pin down. What I find predominantly being shared in my stream on Google+ is stuff about Google+. It’s easy then for me to conclude people only talk about Google+.

Recently, there has been more of a mixture, and people are trying to deliberately come up with other stuff. I don’t know if people necessarily know what it is they should be doing on Google+.

Eric Enge: Does this personal approach on Twitter, in your opinion, have much to offer in terms of building trust between a company and their audience?

There is no exact formula to what you should do on Twitter.

Danny Sullivan: Many people say there is an exact formula to what you should do on Twitter but there is no one right formula. SEOMoz does a lot of engagement, and we do virtually no engagement on Search Engine Land. Yet, our follower accounts are about the same.

We viewed our accounts as a way for people to keep up with what we are publishing on the site and that seemed to work for people. They had more of a focus on a customer service role, and that seemed to work for them.

The brands get excited about the customer service role, but I think they ought to be handling customer service through their regular channels so people don’t feel they have to yell out on Twitter.

Eric Enge: I guess the problem is if you hadn’t screwed it up with them in the first place then they wouldn’t need it.

Danny Sullivan: Exactly. Maybe it does work as a final alert or a safety valve. Ultimately, how you act on Twitter will be how you think you should act and what feedback your followers give you about how you should act. I don’t think you have to put a lot of personal stuff because you are a brand. I think that’s an essential thing for me personally, but it might not work for some brands.

Search and Social

Eric Enge: Let’s talk a little more about the integration of search and social. I interviewed Stefan Weitz and he saw search becoming an integrating dashboard for certain types of queries. For example, if you search for a “romantic dinner Austin”, you would certainly see what restaurants your friends liked. In the future you might see who is there right now and if they’ve checked in, and you could book a reservation right there in search results.

Search has his cousin called discovery and social is very strong at providing that.

Danny Sullivan: I think it’s a mixture of things. I think search has his cousin called discovery, which is showing you things that you didn’t necessarily know you wanted or needed, but you are happy to have come across. I think social is very strong at providing that.

You tap into Facebook and find an interesting news article which never occurred to you to search. That’s something search doesn’t do well because search is an on-demand activity. That’s why it makes sense for Google to play in the social space, so they can tap into this discovery process.

In social, your friends can get signals that potentially improve your search results in an age when links are becoming less and less important. Social shares from people you know. They are the new link building and are a more trusted signal.

It is easier to get people to do it because people share things more freely. There are signals that can be tapped into that are important for search to continue to improve, but, in terms of being a dashboard, I don’t know if it overtakes everything. I think it continues to be another signal that’s used, but not necessarily the secret key to magically improving everything.

Eric Enge: So, at a high level, we could think of the search interface remaining the on-demand interface, as opposed to a discovery-oriented interface.

Danny Sullivan: Two things: first, when a pipe breaks in your house do you go onto Facebook and ask friends who you should call or do you go to Google and search for a plumber? You search for a plumber. It’s an on demand need.

On the other hand, you need a dentist, it’s not an emergency, and you need a good recommendation. Tapping into your friends is very powerful. How the search engines figure out a way to integrate that is a next big step.

I see people do the “Anyone knows” searches on Facebook and Twitter. They are looking for recommendations. Finding a way to integrate your search for a dentist on Google, and also making it clear to your social network that you are looking for help, is perhaps the holy grail.

At the SMX Advanced show, Rand Fishkin from SEOMoz talked about their latest study which Facebook Shares were highly correlated with Google Rankings.

Eric Enge: But not causation.

Danny Sullivan: Not causation, but they are highly correlated. He diligently said he was not saying that a lot of Facebook Shares means you will rank well on Google but, rather, that there was an interesting connection that seemed to be occurring.

Independently of that, I think it’s a good idea to get a lot of Shares on Facebook. Facebook is a huge area and getting Facebook Shares means traffic. So, why wouldn’t you want that regardless of the search engines?

Of course, at the end of the same day at SMX Advanced, Matt Cutts said “well, it’s an interesting correlation, but we don’t use the Share data.”

Takeaways

Eric Enge: There are many people who, if they saw a strong correlation between Shared data and rankings, would artificially create and manipulate things so they get more Shares without necessarily thinking ,”oh well, why don’t I go behave in a way that causes people to want to Share my stuff.”

This more holistic approach is surprisingly effective. The approach I focus on is trying to publish good sites and promote them effectively.

Danny Sullivan: It works really, really well. The first time I saw a doorway page, which was around 1998, I didn’t really get it. It hadn’t occurred to me that someone will try to build content without actually having a content site. There are people who will chase for the algorithm and not have a content site behind it.

Without trying to pass judgment on it, that’s just not me. It’s not the audience I am trying to help. My assumption is if you want to do well in search engines in the long term it is good to have a good content site. That is what they seem to reward.

That brings me back to the Periodic Table of SEO Ranking Factors that we put together. I wanted to list these different factors that people should pay attention to. I wanted to do it in a way that they didn’t get lost in the forest. I did not want them to climb up one specific tree in that forest because it’s more important. I want them to be socially active. That seems to be a useful thing if you want to do well in search engines.

If you know what you are doing, then getting to more specifics can be helpful. There are too many people who don’t know what they are doing when it comes to Search Engine Optimization, and those kinds of specifics lose them and set them down the wrong trails.

You can see self-evidently that social media channels generate traffic.

It’s the same thing when it comes to social. They argue whether or not this Facebook Share is going to count versus a LinkedIn share versus a Twitter share, and if there is a no-follow or if there is not a no-follow. You can see that social media channels generate traffic. If that is traffic that’s converting for you then you should be social.

You can also see that the search engines are experimenting with how to use social signals. Even if you don’t know exactly how they are doing it, it behooves you to be active socially because chances are it’s going to increase some of those signals, and you are going to be giving out the right ones.

Eric Enge: If something is tweeted a million times, whether the link is followed or no-followed, a search engine is going to take notice. In the interview I did with Vanessa Fox, I ended up calling it A holistic view of Panda because we have a lot of this kind of discussion in there.

Danny Sullivan: There is a lack of holistic thinking out there. Maybe that will change. The reality is it’s not going to change anytime soon.

Eric Enge: Thanks Danny!

Other Recent Interviews

Bruce Clay, August 1, 2011
Google’s Tiffany Oberoi, July 27, 2011
Mona Elesseily, July 18, 2011
Vanessa Fox, July 12, 2011
Jim Sterne, July 5, 2011
Stephan Spencer, June 20, 2011
SEO by the Sea’s Bill Slawski, June 7, 2011
Elastic Path’s Linda Bustos, June 1, 2011
SEOmoz’ Rand Fishkin, May 23, 2011
Bing’s Stefan Weitz, May 16, 2011
Bing’s Mikko Ollila, June 27, 2010
Yahoo’s Shashi Seth, June 20, 2010
Google’s Carter Maslan, May 6, 2010
Google’s Frederick Vallaeys, April 27, 2010
Matt Cutts, March 14, 2010

New Clarity on Reconsideration Requests from Tiffany Oberoi

photo of Tiffany OberoiTiffany is a software engineer on the Google’s Search Quality team. She joined Google in 2006 and focuses on webspam issues and webmaster communication. Prior to joining Google she worked as a software engineer at Computer Associates and a high school math/engineering teacher in Harlem, New York. She earned her bachelor’s degree in Computer Science from the University of Virginia.

Key Interview Points

I am going to keep the key points summary short in today’s interview. Tiffany’s responses bring new clarity to the reconsideration request process. Here is what Matt Cutts Tweeted about the interview:

Matt Cutts Tweet about this post!

Read on and enjoy!

Interview Transcript

Eric Enge: Thanks for taking the time to address our questions!

Tiffany Oberoi: Sure! I know that reconsideration requests can be stressful. We want to do our best to clear up any misconceptions about the process.

Eric Enge: The reconsideration request process is an incredibly important tool for those whose sites have been impacted by a penalty.

Let’s start by understanding a bit better the types of penalties. The most extreme penalty is a banning of a site from the index. I usually think of this as something you can recognize by search on the site brand name or domain name and not getting the site to show in the results, or where a site: query shows no results. If you can tell me, are there other types of manual penalties that may be assessed?

Tiffany Oberoi: We do have a few different manual actions that we can take, depending on the type of spam violation. We would tend to handle a good site with one bad element differently from egregious webspam. For example, a site with obvious blackhat techniques might be removed completely from our index, while a site with less severe violations of our quality guidelines might just be demoted. Instead of doing a brand name search, I’d suggest a site: query on the domain as a sure way to tell if the site is in our index. But remember that there can be many other reasons for a site not being indexed, so not showing up isn’t an indication of a webspam issue.

Eric Enge: The other major type of penalty is an algorithmic penalty. The algorithms make some determination of a problem behavior and adjust the rankings in some fashion. Is that a reasonable short description?

We try to take an algorithmic approach to tackling spam whenever possible

Tiffany Oberoi: Spam algorithms are essentially computer programs that engineers have written to classify webspam. We try to take an algorithmic approach to tackling spam whenever possible because it’s more scalable to let our computers scour the Internet, fighting spam for us! Our rankings can automatically adjust based on what the algorithms find, so we can also react to new spam faster.

And just to be clear, we don’t really think of spam algorithms as “penalties” — Google’s rankings are the result of many algorithms working together to deliver the most relevant results for a particular query and spam algorithms are just a part of that system. In general, when we talk about “penalties” or, more precisely, “manual spam actions”, we are referring to cases where our manual spam team stepped in and took action on a site.

Eric Enge: Do reconsideration requests have any value in the case of algorithmic penalties? Or are they only valid for manual penalties?

Reconsideration Request

If a site is affected by an algorithmic change, submitting a reconsideration request will not have an impact

Tiffany Oberoi: If a site is affected by an algorithmic change, submitting a reconsideration request will not have an impact. However, webmasters don’t generally know if it’s an algorithmic or manual action, so the most important thing is to clean up the spam violation and submit a reconsideration request to be sure. As we crawl and reindex the web, our spam classifiers reevaluate sites that have changed. Typically, some time after a spam site has been cleaned up, an algorithm will reprocess the site (even without a reconsideration request) and it would no longer be flagged as spam.

Eric Enge: As a related question, is a reconsideration request helpful after addressing possible panda issues?

Tiffany Oberoi: Panda is an algorithmic ranking change targeted at promoting high quality sites over low quality sites. Because reconsideration requests will not change the way an algorithm sees your site, a reconsideration request won’t help in this case. We recommend focusing your efforts on improving your site so that it will be classified as high quality in the next Panda update. Amit Singhal had some great tips for how to improve your site in this post from Google’s Webmaster Central Blog.

Eric Enge: Does it ever happen that a reconsideration request get accepted, but then the same penalty gets applied again (perhaps after a subsequent crawl)?

Tiffany Oberoi: This is definitely possible if the bad behavior comes back. For example, we see sites getting hacked repeatedly. The webmaster cleans up the hacked pages, but doesn’t close the security hole. They might even submit a successful reconsideration request, but if the security hole is still open it is likely to be exploited again.

Eric Enge: Is there a potential downside to making a reconsideration request for sites when they are not entirely sure if they’ve been penalized? In other words other issues are discovered in the process?

Tiffany Oberoi: While in theory it’s possible that spam could be uncovered while processing a reconsideration request, that’s not the goal. The people reviewing a reconsideration request are first and foremost interested in whether the violation of our quality guidelines has been fixed. I wouldn’t let this stop you from submitting a request if you think there is a chance that your site had a violation. But before submitting a reconsideration request, I do recommend a detailed review to make sure your site does not violate any of Google’s webmaster guidelines.

Eric Enge: What would you recommend the structure of a reconsideration request look like? In other words, what major issues should it address? Are there things to avoid?

Tiffany Oberoi: Here are a few tips:

1. Be specific. Carefully review Google’s webmaster guidelines and tell us what issues you found on your site and how you fixed them.

2. Avoid hiding information. This is the time to address the issues head on. For example, a reconsideration request that says, “My sites adheres to the guidelines.” is not as useful as one that says, “I had some hidden text at the bottom of my homepage, but I have removed it now.” The second example makes it clear what the initial problem was and what has changed. The more detail you can provide, the better. It helps us assess the situation more fully.

3. We want to be assured that we aren’t going to see these spammy techniques again. It’s helpful if you can include details about steps you’ve taken to prevent it from happening again, policy changes, etc. The people who review these requests want to be confident that the spam techniques have been removed and are not likely to return.

4. Don’t mention how much you spend on ads. The team that handles reconsideration requests only cares about search quality. It’s irrelevant and doesn’t help your case to mention buying ads or being a partner or customer of other products.

5. We get a lot of reconsideration requests from webmasters that are not even affected by a spam issue, so my other advice is to explore other possible issues as well. For example, check Webmaster Tools for crawl errors. Make sure your robots.txt isn’t blocking Googlebot from accessing your site. Here’s an article with a detailed discussion of other possible ranking problems.

Eric Enge: Should the person submitting the request expect to get a response? Does Google ever provide explicit feedback on the problem(s) found?

… we are currently running an experiment to provide more specific information about the outcome of the request.

Tiffany Oberoi: We generally send a message to Webmaster Tools after the request is received and again after the request has been processed. In the past we’ve gotten a lot of feedback from webmasters who want to know what happened after we processed the request. We listened to that feedback and we are currently running an experiment to provide more specific information about the outcome of the request.

For example, in some cases we can communicate back to the webmaster that we were able to revoke a manual action based on their reconsideration request. Or sometimes we let them know that their site is still in violation of our guidelines. This might be a discouraging thing to hear, but it helps webmasters diagnose what’s going on if they know that there actually is a spam issue.

In the majority of cases, we’re able to let the webmaster know that they aren’t affected by any manual spam action at all. This allows the webmaster to focus their attention on other areas instead of submitting multiple reconsideration requests and wondering why they aren’t seeing results.

Eric Enge: If the person submitting the request does not hear anything and nothing changes, should they resubmit?

We do send a confirmation after we receive your request, so as long as you got that message then your request is in the queue to be reviewed.

Tiffany Oberoi: You generally don’t need to resubmit. It can take us days to weeks to process requests, and then more time for changes to go into effect, especially if we need to recrawl and reprocess your site. We do send a confirmation after we receive your request, so as long as you got that message then your request is in the queue to be reviewed.

I don’t recommend sending multiple reconsideration requests in a very short period of time or submitting reconsideration requests for tons of sites all at once rather than one site at a time. We can take that as a sign of bad faith. But if you haven’t received a follow up message saying that your request has been processed after 2-3 months, it would be reasonable to submit another request at that point.

Over time, the reconsideration request process has improved substantially. We’ve made a lot of progress on making our assessments and the entire reconsideration review process more transparent. I’m excited that most webmasters can find out whether their site has been affected by a manual action, and that they’ll know the outcome of the reconsideration review.

Other Recent Interviews

Mona Elesseily, July 18, 2011
Vanessa Fox, July 12, 2011
Jim Sterne, July 5, 2011
Stephan Spencer, June 20, 2011
SEO by the Sea’s Bill Slawski, June 7, 2011
Elastic Path’s Linda Bustos, June 1, 2011
SEOmoz’ Rand Fishkin, May 23, 2011
Bing’s Stefan Weitz, May 16, 2011
Matt Mickiewicz, January 8, 2011
ex-Googler Adam Lewis, October 10, 2010
Bing’s Mikko Ollila, June 27, 2010
Yahoo’s Shashi Seth, June 20, 2010
Google’s Carter Maslan, May 6, 2010
Google’s Frederick Vallaeys, April 27, 2010
InfoGroup’s Pankaj Mathur, April 5, 2010
Matt Cutts, March 14, 2010

Google Ad Extensions with Mona Elesseily

photo of Mona Elesseilly For the last 12 years, Mona Elesseily has focused on paid search strategy and conversion improvement. In her career, she has significantly improved campaign performance for brands such as Capital One, CareerBuilder.com, Cathay Pacific and The Jimmy Pattison Group to name a few. She regularly speaks at Canadian, US and international online marketing events on paid search, landing page optimization, ad copy creation and integrated online marketing strategy. Some of the events Mona presents at include Search Engine Strategies (SES), Search Marketing Expo (SMX), Ad:Tech, Shop.org, the International Marketing Conference (IMC), WebmasterWorld (PubCon) and the International Internet Marketing Association (IIMA).

Mona has written 2 books on Yahoo! and frequently writes for industry publications. She is a columnist for Search Engine Land and also writes for other publications including the Yahoo Search blog, iMedia Connection and the AMA (American Marketing Association). Her industry knowledge is regularly sought after by the business community, including Wall Street analysts; and she’s frequently quoted in respected publications. Mona has served on advisory boards for both Yahoo! Search Marketing and Acquisio Inc, and is the Vice President, Online Marketing Strategy for Page Zero Media.

Key Interview Points

As you can see from her bio, Mona has a great deal of expertise in pay per click marketing. We decided to chat about the various types of Ad Extensions for Google AdWords. What makes these interesting is that they can drive a significant increase in click through rates for you ads. Here are some of the important points that we discussed:

  1. (Mona) “All the extension products increase click through rate. Of course, keep in mind that driving ROI is a completely different story.”
  2. (Mona) “… you are not charged for clicks that expand the map, but you are charged for clicks that go from the information window to the website.”
  3. (Mona) “A common mistake people make with AdWords is lumping everything together.”
  4. (Mona) “With mobile, you want to make sure you are bidding higher and your queries are slightly shorter.”
  5. (Eric) “It (phone extensions) may not work well for products that people prefer to touch and feel before they buy.”
  6. (Mona) “Trends work really well for Sitelinks. It’s a way to highlight a new product”
  7. (Mona) “Compare a Product Extension ad with a normal paid search marketing ad. This attracts more attention.”
  8. (Mona) “Yes, this is very new and still in Beta. It’s called Communication Ad Extensions. In the ad, Google will display a way which you can communicate with the business.” What makes this interesting is that Google brokers the communication so the business does not get yuour contact info during the initial communication.
  9. (Mona) “The key is to test them (the Ad Extensions) as not everything is going to resonate with all audiences and all products.”

The Location Extension

Eric Enge: Can you tell us about the Location Extension?

Mona Elesseily: The Location Extension shows as an address underneath the ad units. If an ad is positioned at the top of the page, it will have a small plus button beside it that can expand to a map. If it’s one of the smaller units on the right hand side of the page, the address will be located underneath the display URL.

The larger ad unit shows the address, the map and the phone number. This information can be pulled from Google Places or advertisers can manually set it up in their Adwords campaign.

For example, when I’m searching for moving companies in Vancouver and find one called “Angel’s Moving Company.” Note that the address is under the ad and there is a plus button to click on.

Angel's Moving

When you click on the plus button it expands out and shows the map. Google will show you these types of results either based on where you are physically located, or if you include a location name in your search query.

Angel's Moving

Eric Enge: This is likely to cause a better click through rate.

All the extension products increase click through rate. Of course, keep in mind that driving ROI is a completely different story.

Mona Elesseily: Yes, all the extension products increase click through rate. Of course, keep in mind that driving ROI is a completely different story. It’s one thing to get clicks, it’s completely another thing to make money from the clicks you’re getting. If you have two ads at the top of the page, and one has a couple of extension products and another one doesn’t, don’t you think the enhanced ad catches your eye more?

Eric Enge: Yes, because it looks a bit different.

Mona Elesseily: Also, you are not charged for clicks that expand the map, but you are charged for clicks that go from the information window to the website.

Phone extensions and Mobile Campaigns

Eric Enge: Would you talk about Phone extensions?

Mona Elesseily: With a Phone extension you will see a clickable phone number displayed on a phone that has a full browser. The advertiser pays the click cost when they click on the phone number and make the call.

Phone extensions

Eric Enge: This is interesting because I’ve seen data that says when people see a phone number they believe the business is more legit.

Mona Elesseily: Another element of that is testing 1-800 numbers versus local numbers. We do many of these tests in an attempt to increase conversion rates. For certain products people feel more comfortable with an advertiser who is in their backyard. For other products they are comfortable buying in another state or even halfway across the world.

A common mistake people make with AdWords is lumping everything together.

If you are using the Phone extension in your AdWords campaign, my advice would be to put it into a separate campaign. A common mistake people make with AdWords is not breaking their AdWords into distinct campaigns. For example, they group their content campaigns, their search campaigns and their mobile campaigns all together. It’s better to have separate campaigns for each.

Eric Enge: Why is that?

With mobile, you want to make sure you are bidding higher and your queries are slightly shorter.

Mona Elesseily: There are search differences between mobile and desktop. For Example, you are setting up a campaign and you want people surfing on their mobile devices to call you. On mobile the queries tend to be much shorter and there are a limited number of positions on the mobile search results page. Therefore, you want to make sure your queries are slightly shorter and you are bidding higher for terms. As you know, there are a very limited number of PPC ad spots on mobile devices.

Eric Enge: What do you think the right space is to target for mobile?

Mona Elesseily: That is an interesting conversation. My answer used to be mobile campaigns should be for products you wouldn’t think twice about buying, like a ring tone or movie ticket. However, I recently wrote an article for Search Engine Land on new trends in mobile advertising that questioned this.

People are now using multiple devices at any one time; for example, surfing to get more information on products they are introduced to on TV. It’s not necessarily to make a purchase so that immediate gratification logic has been shattered. The Search Engine Land article has a lot of information on how people us their smartphones today. The space is really changing.

Eric Enge: However, if anyone has a phone-based way of ordering, they should consider an AdWords campaign using the Phone extension, correct?

Mona Elesseily: I’ve used it in some circumstances and it’s worked well but there are circumstances where it hasn’t worked well. Local tends to do well; for example, if you want to call to make an appointment with a chiropractor or order a pizza.

Eric Enge: It may not work well for products that people prefer to touch and feel before they buy.

Mona Elesseily: Like a car, people aren’t buying cars online.

Sitelinks

Eric Enge: Let’s move on to the Sitelinks extension.

Apple Ad

Mona Elesseily: Sitelinks are the small hyperlinks you see underneath the ad. The Apple ad above is a good example of this.

Eric Enge: Yes, I see a link for the white iPhone, iPad 2, MacBook Pro and even a Back to School offer. What’s interesting is they seem to be taking advantage of a hot trend like the white iPhone and an iPad with free engraving.

Trends work really well for Sitelinks. It’s a way to highlight a new product

Mona Elesseily: Sitelinks are a good way to highlight a new product or enhance your advertising related to a specific product offering or to drive traffic to your high conversion page or high margin products. Another good use is to segment users. For example, provide different links for business and consumer customers if your business targets them both. I cover this in depth in a Search Engine Land article on Supercharging Your Ads with Sitelinks.

Eric Enge: How does Google determine when and where to show your Sitelinks?

Mona Elesseily: There is a CTR requirement. You are compared against other people in your industry and then Google makes a call as to which ads will display Sitelinks. For the most part they show up for transactional searches. If you do some of these broader generic types of searches, you are not going to see the Sitelinks come up as much. At this point, Sitelinks only show up in the top ad spots and do not show up in ads on the right hand column.

Eric Enge: Obviously, it’s easier to have your Sitelinks show if you are a recognized brand because your CTR differential would usually be pretty substantial.

Mona Elesseily: Yes, in theory. In practice sometimes it’s slightly different.

Eric Enge: Is there anybody who shouldn’t use Sitelinks?

Mona Elesseily: Good question. I tried a Sitelink for the forum of a very successful website selling in a particular niche. It didn’t convert well. I also tried a Sitelink that was more of a help link than a product link and it didn’t convert well either.

You want to focus on something concrete, not supplemental, like the examples I gave above. Special offers, special promotions (like flowers on Valentine’s Day) are used to drive traffic to products or areas that tend to convert well in your business.

Eric Enge: Right. If you look at our Apple example, we found it on a generic search but the Sitelinks are for specific things you buy now.

Mona Elesseily: Exactly.

Product Extensions

Eric Enge: Which leads us to Product extensions.

Mona Elesseily: Let’s look at our Sony example and note it’s showing a picture, short description and prices. This is a feed product and feed info is pulled via Google Merchant Center. The Merchant center account is then linked to AdWords in the campaign settings tab.

Sony Laptops Ad

The feed has to be in XML format and mandatory fields must be filled out. The mandatory fields are: product name/title, product description, landing page link, image link, product ID, the condition of the product (new, used, etc.) and price.

For advertising to trigger accurately we suggest adding an extra column, called adwords_label, and using keywords separated by commas. This helps Google get more specific in serving up specific products to specific ads in your AdWords account. It takes work to set up the feed but it’s worth it in the end.

Eric Enge: Obviously, the picture of the product and the pricing information are awesome.

Compare a Product Extension ad with a normal paid search marketing ad. This attracts more attention.

Mona Elesseily: The larger format takes up more space and attracts more attention so there’s a better chance of visitors clicking on the ad unit. It also presents people with more information/options on what they are looking for right off the bat.

Eric Enge: I think it gets people more engaged in the buying process. They may look at the first one and decide they want a laptop that is more high end. You’ve gotten them thinking about what they really want which can be incredibly valuable.

Mona Elesseily: Again, we see how the use of these extensions can increase CTR. Once you get them in, hopefully you can continue engage and they’ll go on to make a purchase.

Communication Extensions

Eric Enge: You have a brand new extension to share with us today.

Mona Elesseily: Yes, this is very new and still in Beta. It’s called Communication Ad Extensions. In the ad, Google will display a way which you can communicate with the business. It will appear as the second line in the ad copy and will say something like “be contacted by this business” and there will be a request to call hyperlink or an email hyperlink. Advertisers can choose one or both forms of contact from visitors.

With this advertising, Google is going to protect visitor info and advertisers will not be directly given contact information. Google will play the middleman with different email addresses and phone numbers.

Eric Enge: Google brokers this from a privacy perspective.

Mona Elesseily: For now it’s only showing up 10% of the time. Google is testing it to see what kind of traction they’ll get.

Eric Enge: And how do you get it?

Mona Elesseily: As it’s in Beta, Google is only offering it to specific advertisers. This product excites me in a couple of different ways. The first is that lead information allows advertisers to directly gauge the impact of their ads and campaigns without fancy tracking. I also like that Google is thinking about customers needs and setting about trying to help customers achieve their goals/needs.

Eric Enge: If you have a business with a local aspect to take phone orders you’ve got five different things you’ve got to do to boost your campaign. If you have a business selling online, there are four things you may do.

Testing and Using the Extensions effectively

Eric Enge: Do you have any recommendations on how to test and implement these Extensions?

The key is to test them as not everything is going to resonate with all audiences and all products.

Mona Elesseily: Do not test too many of the extension products at the same time or you will not be able to effectively attribute gains to any one extension product. Test the different product separately to see what resonates well with your specific audience. For example, you may find that phone extensions work well but Sitelinks do not work so well for your company.

Eric Enge: Would you recommend setting up a new campaign to test the addition of an Extension, along the lines of an AB test?

Mona Elesseily: I’d set up a different campaign and test against similar ads in another campaign with no extension products running. Note: If campaigns are low volume, A/B could take a while to get statistically significant data. If this is the case with a company, I drive all traffic to new ads and compare it to a prior period (same duration of time). It’s not the most scientific but provides good insights to work with. Just be sure not to run tests during a peak periods, as data will be very different (skewed) than the data you see in non-peak period times.

Eric Enge: Thanks Mona!

Other Recent Interviews

Vanessa Fox, July 12, 2011
Jim Sterne, July 5, 2011
Stephan Spencer, June 20, 2011
SEO by the Sea’s Bill Slawski, June 7, 2011
Elastic Path’s Linda Bustos, June 1, 2011
SEOmoz’ Rand Fishkin, May 23, 2011
Bing’s Stefan Weitz, May 16, 2011
Matt Mickiewicz, January 8, 2011
ex-Googler Adam Lewis, October 10, 2010
Wordtracker’s Ken McGaffin, August 16, 2010
Bing’s Mikko Ollila, June 27, 2010
Yahoo’s Shashi Seth, June 20, 2010
Majestic SEO Briefing, June 14, 2010
SEOmoz Briefing, June 9, 2010
Localeze Briefing, June 2, 2010
Google’s Carter Maslan, May 6, 2010
Google’s Frederick Vallaeys, April 27, 2010
InfoGroup’s Pankaj Mathur, April 5, 2010
Matt Cutts, March 14, 2010

A Holistic Look at Panda with Vanessa Fox

photo of Vanessa FoxVanessa Fox, called a cyberspace visionary by Seattle Business Monthly, is an expert in understanding customer acquisition from organic search. She shares her perspective on how this impacts marketing and user experience and how all business silos (including developers and marketers) can work together towards greater search visibility at ninebyblue.com. She’s also an entrepreneur-in-residence with Ignition Partners, Contributing Editor at Search Engine Land, and host of the weekly podcast Office Hours. She previously created Google’s Webmaster Central, which provides both tools and community to help website owners improve their sites to gain more customers from search and was instrumental in the sitemaps.org alliance of Google, Yahoo!, and Microsoft Live Search. She was named one of Seattle’s 2008 top 25 innovators and entrepreneurs. Her book, Marketing in the Age of Google, provides a blueprint for incorporating search into organizations of all levels.

Key Interview Points

I really enjoy speaking with Vanessa about search because of her perspective about how to do things. As readers of mine know, I am a fan of the trite old way of doing it – producing a great web site, making it search friendly, and then promoting it well. Vanessa is truly an industry leader in promoting this type of thinking.

This is a great interview for you to read if you want to get a strong feeling for the philosophy that drove the Panda algorithm, and the implications of that philosophy going forward. Here are some of the major elements that I extracted (and paraphrased except in those situations which are quoted) from the discussion we had:

  1. Like any business, Google seeks to maximize its profitability. However, Google believes that this is best done by providing maximum value to end users, as this helps them maintain and grow market share. They make more money this way than trying to squeeze extra CPM out of their web pages at the cost of user experience.
  2. The AdWords team does not have access to the organic search team, and as a result the engineers working on organic search are free to focus on delivering the best quality results possible.
  3. (Vanessa) “Panda isn’t simply an algorithm update. It’s a platform for new ways to understand the web and understand user experience”.
  4. Panda is updated on a periodic basis, as opposed to in real time. This is similar to updates to the PageRank displayed on the Google Toolbar, except it is a whole lot more important!
  5. It is easier to reliably detect social spam than link spam.
  6. (Eric) “If you’ve got twelve different signals and someone games two of them and the other ten don’t agree, that’s a flag.”
  7. Don’t focus on artifical aspects of SEO. If it seems like a hokey reason for a web page to rank higher, it probably isn’t true. If by some chance it is true, first it is most likely a coincidence, and second and more importantly, you can’t count on it staying that way.
  8. (Vanessa) “I suggest you get an objective observer to provide you feedback and determine if there are any blind spots you’re not seeing.”
  9. (Vanessa) “The question then becomes if someone lands on your site and they like that page, but they want to engage with your site further and click around your site, does the experience become degraded or does it continue to be a good experience?”
  10. Added value is key. Search engines are looking more and more for the best possible answer to user’s questions. Even if your article is original, if it covers the exact same points as hundreds of other articles (or even 5 other articles) there is no added value to it.
  11. Reviews can be a great way to improve web page content provided that they are contextually relevant and useful.
  12. Crowd sourced content is also potentially useful, but must also be relevant and valuable.
  13. One of the challenges facing both UGC and Crowd Sourcing is the editorial challenge of making sure it is useful and relevant.
  14. Branding can be very helpful too, as it helps people trust the content more. Search engines recognize this as a differentiator as well.
  15. (Vanessa) “I think social media levels that playing field a bit. In the past, you had to hire a publicist, do press releases, have relationships with reporters, and get on Good Morning America, or something on that order, to get your name recognized.”
  16. SEO is still important! Making sites that are easily understood by search engines is still something you need to do. Effective promotion of your web site remains critical too.
  17. Unfortunately, for many sites that have been hit by Panda, there is no quick fix. There are exceptions, of course, but they will be relatively rare.

Motivations of Google

Eric Enge: Let’s talk about what Panda was from a Google perspective and what they were trying to accomplish rather than the mechanics of what they did.

Vanessa Fox: I like that you addressed it that way because many people simply want to know mechanically what they did.

This update took many people by surprise and, certainly, there are things to be worked out. However, Google has never been secretive about what it’s trying to accomplish and, specifically, what it’s trying to accomplish with Panda.

Ever since Google launched, its primary goal has been to figure out what searchers want and give them that. This encompasses a lot of things. It encompasses answering their question as quickly and as comprehensively as possible. It involves all the things you think about in terms of making the searcher happy and providing a good user experience.

In the early days of the web, the only way Google knew if people found something valuable was if there was a link to it. Today, the web is more sophisticated and Google has much more information available to it. The bottom line is that Google is trying to provide the best results for searchers and, for them, Panda was a major step forward in accomplishing this.

Eric Enge: Yes, some people believe that Google made these changes because it favors their advertisers and their objective is to make more money in the short term. I don’t believe this. To me, the value of market share far outweighs the impact you could get by jacking up your effective CPM by a few percent on your pages.

It is short term and shortsighted to think Google is now focused on improving CPMs or trying to drive people … to advertise via AdWords.

Vanessa Fox: That’s absolutely right. It is short term and shortsighted to think Google is focused on improving CPMs or is trying to drive people, who lost ranking in the organic results, to advertise via AdWords. Google is looking for long term market share which is the best way for them to maximize profitability.

The root of their market share is the fact that they get so many people searching all the time. The best monetary decision for the company is to ensure that searchers experience excellent search results. That’s the core that’s going to help Google maintain their market share which, in turn, is what will help them grow.

Eric Enge: I’ll paraphrase it simply and say they are totally selfish and they are being selfish by working on their market share.

Vanessa Fox: That is exactly right. Many people don’t believe that there is a wall between the organic search people and everything else at Google. If they didn’t have such a wall you would have a situation where someone on the AdWords team would be approached by a large advertiser saying “I am having problems with the organic results, can you help me?”

Of course, that person would want to help the advertiser. By having that wall, the AdWords person doesn’t have access to the organic search people. There is this protectiveness around organic search, which enables those engineers to focus on the search experience. They don’t have to think about AdWords, they don’t have to think about how Google is making money, or what the CPMs are. They don’t have to think about any of those things and are able to concentrate on making the best search experience.

The whole environment was built that way which is unlike many other companies. In other companies, no matter what part of the organization you work in, you have to always think about how does this impact our revenue. At Google this is not part of the search engineers’ focus, which is great. Another reason is that many of the search engineers have been at Google since the beginning. They don’t have to work there anymore.

Island Eric Enge: At this point they could easily retire and buy an island.

Vanessa Fox: They continue to work there because they love data and love working with large amounts of data and improving things. I think if someone said to them,”I know you work on organic search, but we’ve decided it’s really important to either give advertisers preference or hold advertisers down. Could you tweak the algorithms?” They would probably say, “I am going to buy my island now, see you later.”

That’s not why they are at Google. They are there because they get to do cool things with large pieces of data. I think these two big factors make it basically impossible for anything other than a search experience to infiltrate what’s going on there.

Think of Panda as a Platform

Eric Enge: What is Panda?

Vanessa Fox: Panda isn’t simply an algorithm update. It’s a platform for new ways to understand the web and understand user experience. There are about four to five hundred algorithm updates a year based on all the signals they have. Panda updates will occur less frequently.

Eric Enge: Right. In the long run it will probably be seen as significant as the advent of a PageRank update.

Vanessa Fox: Yes, absolutely.

Link Graph Eric Enge: At SMX Munich Rand Fishkin heard from Stefan Weitz and Maile Ohye that it’s a lot easier to recognize gaming of social signals than it is to recognize link spam.

Vanessa Fox: The social signals have more patterns and footprints around them. Also, the code that search engines use has gotten more sophisticated, and they have access to more data.

Eric Enge: Another thing I hear people talking about is that over time Google is looking to supplant links with other signals. My take on this is that links are still going to be a good signal, but they are not going to be the only signal.

Links will continue to be augmented with more data, which will make the value of links less important because there are other signals now in the mix.

Vanessa Fox: Google has been saying that for years. I don’t think the value of links will ever go away. They’ll continue to be augmented with more data, which will make the value of links less important because there are many other signals now in the mix.

Google never intended to be built solely on links. We didn’t have social media and Facebook like buttons, and all these things in the past. We only had links. Google was based on how can we build an infrastructure that algorithmically tells us what content people are finding most valuable on the web.

Google and Bing as black boxes

Eric Enge: I think another key component of this story is that Google and Bing are increasing the obscurity of the details of the algorithm. That’s not perfect phrasing, but I think you know what I mean.

Vanessa Fox: I think it becomes harder to reverse engineer for a number of reasons. There are so many moving parts that it’s hard to isolate. People who have systems that attempt to reverse engineer different parts of the algorithm for different signals may come to conclusions that are, or are not, accurate. This is because it’s impossible to isolate things down to a single signal.

You find cases where people think they have but, in reality, it’s the tip of an iceberg because you can’t see everything that’s under the surface. By having more signals and knowing so much more about the web the artificial stuff becomes more obvious.

Eric Enge: Absolutely. If you’ve got twelve different signals and someone games two of them and the other ten don’t agree, that’s a flag.

Vanessa Fox: Right. Which is why it’s so disheartening to me to see that some SEOs continue to react to this by saying, “okay, how can we figure out the algorithmic signals for Panda so we can cause our pages to have a footprint that matches a good quality site.” This is very short term thinking because the current signals are in use only during this snapshot in time.

At this point it’s going to be as difficult to create a footprint of a site with a good user experience as it would be to just create a site with a good user experience. This, of course, is not only a better long term perspective and more valuable, but it will result in a better rate of conversion for most businesses.

I’ve heard some people say things like, they’ve done some analysis and found that you have to vary the length of your articles on pages, so make sure that all of your articles are variable in length. And this is craziness. Even if it works this minute, next week it won’t work and then they will say the sky is falling again.

I read an article where a person said Seth Godin writes really short blogposts so he is going to be impacted by Panda, and how does Google know that if an article is short, it’s not valuable. But Google’s algorithms are not as simplistic as that. Seth Godin has not said he’s lost ranking because of Panda.

I commented on the post, and said this is not true. Google isn’t saying that a short article is not a valuable article. Publishers should make blog posts or articles as short or long as they need to be.

There will be plenty of cases where the best article is a short article.

Eric Enge: There will be plenty of cases where the best article is a short article.

Vanessa Fox: Absolutely and those will continue to rank.

How Publishers should think about Panda

Eric Enge: What would you say to a publisher if they believe they were unfairly affected by Panda? This is a tough question because 98% of the people affected by Panda will say they are in this category. They believe they were a drive by victim rather than something that fell out of the algorithm.

Vanessa Fox: That is a complicated question. I will not dispute, and I don’t think Google would dispute, any algorithmic change from any search engine has the potential of causing some collateral damage. If what you are doing as a search engine is asking, ” are the search results better?” then if the search results are better that doesn’t mean that a site with good content doesn’t accidentally end up lower.

That’s going to be the case with any change a search engine makes. From a content-owner perspective that is not good, which we’ll talk about in a second. However, I talked to many people affected by this and 75% to 80% of the time they said I’ve been hit and I shouldn’t have been hit. There have been only a few occasions where people say, “yeah, I’ve gotten away with it for a long time and they cut me off.”

Eric Enge: You appreciate their honesty, don’t you?

Vanessa Fox: Oh, absolutely. But most of the time people say I shouldn’t have been hit. If you’ve been working on a site for a long time, you may not see the areas it can be improved. I suggest you get an objective observer to provide you feedback and determine if there are any blind spots you’re not seeing. I think that would be a good first step.

It’s not one signal that’s been used. You need to determine does this page answer the question, does this help someone accomplish something.

Essentially, this has become a holistic thing. It’s not one signal that’s been used. You need to determine does this page answer the question, does this help someone accomplish something?

As a business you have to make money. You also have to understand that if a site is optimized for making as much money per visitor from ads as possible, as opposed to being optimized at being useful to the searcher, this site is probably not what a search engine wants to show as the best search results.

You have to balance that. Does it answer a searcher’s question, but also does it answer that questions better than any other site and is the answer easy to find? Look at the quality of what’s being said versus the quality of the other pages that are ranking. Is it better or worse? Then you have to determine if the content is awesome and is that obvious to the searcher.

From a user experience perspective, when they land on that page is the content they need buried? The user experience becomes important because Google wants the searcher to be happy and easily find their answer.

Let’s say the content and the user experience are good for that page. Then you run into the issue of quality ratio of the whole site. The question then becomes if someone lands on your site and they like that page, but they want to engage with your site further and click around your site, does the experience become degraded or does it continue to be a good experience?

For example, last year Google had this emphasis on speed, because their studies found that people are happier when pages load faster and abandon sites that load slowly. I’ve worked with companies whose pages take fifteen seconds before they load. No one will wait around anymore for fifteen seconds to load a page.

I don’t think this is a big part of Panda, it is just for illustration purposes.

If you isolate that as a signal you can have the best content in the world and the best user experience in the world. However, if someone does a search and lands on your page but it takes fifteen seconds for anything to appear, they’ve had a bad experience and they are going to bounce off.

You have to look holistically at everything that’s going on in your site. This is what you should be doing, as if search engines didn’t exist.

Eric Enge: Right. There is another element I want to get your reaction to which I refer to as the “sameness” factor. You may have a great user experience. You may have a solid set of articles that cover hundreds of different topics, and they may all be in fact original. However, it’s the same hundred topics that are covered by a hundred other sites and the basic points are the same, even though it’s original, there is nothing new.

Vanessa Fox: Right. I think that’s where added value comes into play. It’s important to look and see what other sites are ranking for. What are you offering that is better than other sites? If you don’t have anything new or valuable to say then take a look at your current content game plan.

Eric Enge: So, saying the same thing in different words is not the goal. I like to illustrate this by having people imagine the searcher who goes to the search results, clicks on the first result and reads through it. They don’t get what they want so they go back to the search engine, they click on a second result and it’s a different article, but it makes the same points with different words.

They still didn’t find what they want so they go back to the search engine, they click on the third result and that doesn’t say anything new either. For the search engine it is as bad as overt duplicate content.

Vanessa Fox: That’s absolutely right.

Eric Enge: It may not be a duplicate content filter per se, which is a different conversation than this one, but the impact is the same. It’s almost like an expansion of query deserves diversity, right.

The search engines have always said they want to show unique results, diverse results, valuable results.

Vanessa Fox: Right. These concepts have all been around for a long time, but we are seeing them perhaps played out with different sets of signals, but they are not anything new. The search engines have always said they want to show unique results, diverse results, valuable results, all these things.

Adding Diversity to your site with User Generated Content

Eric Enge: One thing I hear people talk a lot about regarding diversity is doing things with user-generated content. In my mind that can be a useful component provided it is contextually relevant and has something useful to say. Do you have some thoughts on that?

Vanessa Fox: Yes. I agree with you, it could go either way. Since Google’s goal is to provide useful, valuable results then you can certainly find pages where user-generated content provides that. If you look at TripAdvisor, which may have its faults, one benefit is that there are numerous first person accounts of hotels and other experiences.

Any hotel or vacation destination you are thinking of going to, you will find authentic, real information from people who’ve actually gone there.

stackoverflow Forums are another example where user-generated content is great. For instance on stackoverflow people are interested in answering questions and having discussions and that’s valuable content. You might have other forums where people aren’t saying anything or are there to spam and put their links.

I think it depends on both the topic and how much you are moderating things, how much time you are spending in curation, how much time you are spending organizing things in a useful way so it’s easy to find.

For instance, let’s say you have a recipe site and people tag their recipes with different variations. If you have a curation process that cultivates that and puts it into topics that people could land on a landing page and see all of the recipes about a particular topic, that will be more useful than things scattered everywhere with random tag pages.

I think there can still be work involved in UGC, although it can be useful and valuable. When you begin looking at health information, for instance, it might become harder. If it’s a site about sharing your experience about an illness, that’s one thing.

If it’s a site about diagnosing people and telling them what they should do to fix their illness, that’s another thing. If it is a group of people as opposed to doctors, you get into this authoritative issue and how do you know it is credible.

Crowd Sourced Content

Eric Enge: There is a related topic that has a different place in the picture, which is the notion of crowd sourced content. Essentially, using crowd sourced data to draw a conclusion, for example, with surveys and polls.

Vanessa Fox: This boils down to the same thing. Is it useful, valuable, credible, authoritative, and comprehensive? Is it all the things people are looking for and does it answer their question better than anything else out there on the web? We can look to TripAdvisor as an example of a site that’s been able to create valuable content on a large scale.

At a larger scale you have to move towards automated processes and, at that point, the curation process becomes harder.

At a larger scale you have to move towards automated processes and, at that point, the curation process becomes harder. Wikipedia has editors that are aggressive towards making sure the content is accurate. However, not all sites have that.

When you do surveys it can be fine, but if you are not manually reviewing the results, because of the large volume of data, that’s when something can potentially go awry, so you have to be careful with it.

walkscore The same thing can happen with aggregating data from different sources. If you look at something like Walk Score, they’ve been able to aggregate the data of how close are schools, bars, and other facilities from your house. Of course, you see other examples where it goes poorly, and you look at the page and it doesn’t make any sense.

Eric Enge: Right. It’s a matter of the context, the effort, and the level at which you are trying to do it.

Vanessa Fox: Yes. I think ultimately there will be a fair amount of work involved with running a business that adds value for people. With this age of technology, you see many cases where people say, “look at all the cool things I can do with technology and it’s very little work on my part.” This is sort of the four-hour work week syndrome.

Often, that does not produce the most valuable results. For instance, if we examine travel and look at a site like Oyster, which was started by Eytan Seidman who used to work on the search team at Microsoft, they pay full-time staff writers with a travel background to travel to hotels, write reviews, and take pictures. They aren’t in every city in the world, and they don’t have every hotel in the world.

That’s a corporate example, but there are travel bloggers, and food bloggers, and other people who only write ten blog posts. However, those ten posts are very comprehensive on the topic.

At a large scale, if you attempt to cover every topic in the world, you are not necessarily going to be able to compete with someone who has written something manually.

At a large scale, if you attempt to cover every topic in the world, you are not necessarily going to be able to compete with someone who has written something manually, gone there, and spent time editing their article. It wouldn’t make sense that your automated content would outrank them.

Fox News Eric Enge: Absolutely. It reminds me of another thread which I am not sure fits in the interview, but I am going to say it anyway. When I grew up I watched the news with Walter Cronkite. He was completely trusted and authoritative. Today we have Fox News, which is entertainment.

That’s the design of Fox News and more power to them; however, you have to imagine that as a culture we are going to have a drive towards getting news from a source that you can trust.

Vanessa Fox: Right. Google did a blog post recently where they talked about the trust element. They said it is certainly one of the questions you should ask yourself when you are evaluating a site. Can you trust it?

Eric Enge: Right. Will you give it your credit card or will you trust it for medical advice?

Vanessa Fox: Would you follow the instructions to save your life? This is where brand comes in. I don’t think it has to be a huge brand, but brand does help the trust factor. Building a brand that people see over and over makes a difference.

This is a major reason why I do not recommend microsites. I know many people who want to do a bunch of micro sites but lack of a brand is one reason I tell them it’s probably not a good idea.

It’s hard to build a brand with a bunch of micro sites that aren’t branded in a unified way. If you build one site under one brand you can build brand engagement; however, you can’t do that with a bunch of micro sites that are branded separately.

Social Media and Branding

Eric Enge: Do you think an effective tactic for beginning to build the brand would involve social media?

Vanessa Fox: It depends on the topic and audience. Where is your audience, are they on social media? If you can engage that audience and build up authority with them that is great. I think social media levels that playing field a bit. In the past, you had to hire a publicist, do press releases, have relationships with reporters, and get on Good Morning America, or something on that order, to get your name recognized.

It still takes work but you can go out on social media, see where people are talking about your topic area, answer their questions, and be that authoritative source. I think it can be great but it doesn’t fit every situation.

SEO still matters

Eric Enge: One last question since we’ve been talking about holistic marketing. The search engines still have mechanical limitations because of how they crawl web pages. So being search engine savvy is still important,

Search Engine Robot Vanessa Fox: Absolutely. Search engines crawl the web and they index the web. Technical aspects, such as how the server responds, how the page URLs are built, and what the redirects are, make a huge impact. You can have the best content in the world but if search engines can’t access that content it’s never going to be indexed to rank. So, absolutely, all that stuff is vitally important.

Eric Enge: The other component is the promotional component which is to go out and implement programs to make people aware of your site and draw links to it, and social media campaigns.

Vanessa Fox: Yes. That’s absolutely the case. I think it goes with the idea you’ve heard from the search engines for a long time which is what would you do if search engines didn’t exist? You need to build your business and part of that is building awareness about your business.

I think the web makes it easier but you need to raise awareness so people know that it’s there. Whether it is through social media or other types of PR, there are many things you can do. You can’t think of your audience engagement strategy as simply SEO. All these other components help SEO, but there are things you need to do in business even if you weren’t doing it for SEO.

The Scope of Panda

Eric Enge: Any last thoughts on Panda?

I talk to many people who have sites that have been hit and I certainly sympathize with their plight. However, there is no quick fix in these cases.

Vanessa Fox: I talk to many people who have sites that have been hit and I certainly sympathize with their plight. However, there is no quick fix in these cases.

I talked to a site owner two weeks ago that said, “maybe if we change our URL so that they are closer to the root of the site instead of having folders in them that will get us back in.” This is the wrong way of looking at it.

Eric Enge: Yes. That’s a clear “no”. For sites who have been hit by Panda, I don’t think, for the most part, there is a quick fix.

Most sites will not be lucky enough to have one section of their site that is a total boat anchor that they can just not index and be done with it. Most sites probably have a real process to go through.

Vanessa Fox: Yes. It’s hard to hear because this is affecting people’s businesses. I think it is going to be a lot of work to figure out who your audience is, what they are they looking for, are you engaging them well, and are you providing value beyond all the stuff that we talked about. It is a process.

Eric Enge: Thanks Vanessa!

Other Recent Interviews

Jim Sterne, July 5, 2011
Stephan Spencer, June 20, 2011
SEO by the Sea’s Bill Slawski, June 7, 2011
Elastic Path’s Linda Bustos, June 1, 2011
SEOmoz’ Rand Fishkin, May 23, 2011
Bing’s Stefan Weitz, May 16, 2011
Matt Mickiewicz, January 8, 2011
ex-Googler Adam Lewis, October 10, 2010
Wordtracker’s Ken McGaffin, August 16, 2010
Bing’s Mikko Ollila, June 27, 2010
Yahoo’s Shashi Seth, June 20, 2010
Majestic SEO Briefing, June 14, 2010
SEOmoz Briefing, June 9, 2010
Localeze Briefing, June 2, 2010
Google’s Carter Maslan, May 6, 2010
Google’s Frederick Vallaeys, April 27, 2010
InfoGroup’s Pankaj Mathur, April 5, 2010
Matt Cutts, March 14, 2010

The Mechanics of Panda

The Panda algorithm hit the SEO world in a big way back on February 23rd /24th. Here is the general update history of Panda:

  1. Panda 1.0: February 23/24, 2011 – The initial launch.
  2. Panda 2.0: April 11, 2011 – added Chrome Blocklist Extension data to impact eHow, plus global English coverage.
  3. Panda 2.1: May 10, 2011 – general algorithm tweaks.
  4. Panda 2.2: June 16, 2011 – improved scraper site detection, probably to reduce the incidence of scraper sites outranking source sites that got hit by Panda.
  5. Panda 2.3: July 23, 2011 – some sites recover due to algo changes in Panda.
  6. Panda 2.4: August 12, 2011 – Panda rolled out internationally.
  7. Panda 2.5: September 28, 2011 – Appears to have affected many sites, including sites with lower levels of traffic.
  8. Panda 2.5.1: October 9, 2011 – minor update.
  9. Panda 2.5.2: October 13, 2001 – minor update.
  10. Panda 3.0: October 19/20, 2011 – a major update that let many sites recover. Evidently, this was intended to help those who had been unfairly hit by Panda back in the game.
  11. Panda 3.1: minor update.

Today I will present a visualization of the basic structure of how this works. I am basing this on the many hours of reading I have done on the topic, Google’s statements that Panda is a document classifier, and the indications by Matt Cutts that it is a process that is run periodically.

First though, a disclaimer. I am not a machine learning expert, and this should be used as a basic conceptualization of the workflow. Major elements are likely to differ from what you see here. However, I believe that this visualization is accurate enough to help you develop a solid mental model for how the algorithm is being applied.

Possible Panda Workflow

As a first step, Google is likely to have defined an initial test set of sites. These sites would then have been classified manually by human raters. The process would look something like this:

Manual Site Classification

This would allow Google to have a strong test database of manually rated sites, which therefore is accurate with a very high degree of probability, perhaps with a 99% degree of accuracy. As you can see sites would have been separated into buckets, such as “Good Sites” and “Bad Sites”.

As a next step, Google may have then spent time analyzing these sites to profile the characteristics of the Good Sites, and also of the Bad Sites, as follows:

Extracting Ranking Parameters

The idea is to develop a model for both types of sites. Of course, you can also have a continuous scale of Goodness, from Bad, Not so Bad, OK, Pretty Good, Very Good, and so forth. Once you have a model for Goodness vs. Badness, you can then step back and analyze what types of parameters you can evaluate algorithimically to get the same results as your human raters did during their evalation.

One key factor in this is the noisiness of the signal. In other words, is there enough data available on all the sites you want to test for the data to be statistically significant? In addition, is it possible that the signal can be ambiguous? For example, does a high bounce rate always mean it is a bad site? Or are there scenarios where a high bounce rate is an indicator of quality? Consider a reference site where a faster bounce might mean that the person got their answer faster.

There are lots of signals you could consider. Here are just a few examples:

User Behavior Content Attributes Searcher Ratings
Brand Searches Reading Level Chrome Blocklist Extension
Site Preview Editing Level +1
Ad CTR Misspellings/Grammar Blocked Search results
Bounce Rate Something new to say
Time on Site Large Globs of Text
Page Views Per Visitor High Ad Density
Return Visitor Rate Keyword Stuffing
Scroll Bar Usage Lack of Synonyms
Pages Printed

Of course, the correlations between good sites and bad sites may use even more obscure signals. A machine learning algorithm may determine that articles that use the word “oxymoron” more than 5 times are inherently poor quality (note to algorithm, this article uses oxymoron only once … oops twice). I personally think that Google would try to constrain the breadth of signals used, but it is certainly possible that the algorithm came up with some unusual correlations.

Once the signals have been decided upon, the algorithm can then set out to test the performance of those parameters with a variety of weights, and can also vary the signals used:

Running the Panda Algorithm

That is the first step. But how did the algorithm do? The next step is to score the results:

Scoring Panda

Once you have your score, the algo can try to figure out what tweaks to make to the parameters used and the weighting of each one to create a better match between the manual classification they did of sites and the algorithmic output. You can also test the results on a larger data set using your validating signals. This allows you to look beyond the limited test set you worked on manually. Together, these comparisons lead to a feedback loop:

Tuning Panda

To finish the process, the machine learning engine would simply repeat the tuning loop until the results were of acceptable quality.

Summary

As mentioned above, this is just my mental model for what took place, and it is likely that the exact course of events was somewhat different.

Ultimately, the key lesson is that publishers need to focus the great majority of their efforts on building sites which offer deep, unique, rich user experiences. The search engines want to offer these types of experiences to their users, and Google and Bing are battling for market share. Focus on giving them what they want in the long run because this battle for market share will surely make roadkill of those that don’t.

The algorithm will certainly be tuned more and more over time, so don’t get too wrapped up in trying to find out the specific factors in use by Google. Even if you succeed in finding it and artificially manipulate your site to score well on those factors, the next set of factors that will get applied may be entirely different. It is simpler to just focus on producing high quality content that is not only non-duplicate, but also differentiated, and then promoting that effectively through a variety of channels.