Following college graduation, Maile worked as a system integrator for content management applications. She believes that beauty can be fostered through collaboration and accessible information and that it’s important data be organized and usable.
Her corporate CMS experience led her to become a Technical Solutions Engineer supporting the Google Search Appliance. After over a year with Google Enterprise (a team she loves!), she joined another wonderful Google family in Webmaster Central.
Eric Enge: Can you talk a bit about the Webmaster Tools API that you announced in June?
Maile Ohye: Yes, I’d love to. For us, the Webmaster Tools really allowed us to reach webmasters of all levels. The API allows for verification, and submission of sitemaps, and it allows us to work with hosting programs. Through these hosting programs, we can reach the webmasters who might not have already heard of Webmaster Tools at all, and help them build better sites. Also, the API allows us to reach the more advanced audience.
So, people that maybe are developers who’ve already used the Google data protocol for other services, can now do this in an automated fashion. They can add and verify a large number of websites as well as submit sitemaps to them by using this API.
Eric Enge: Right. And, as I recall, the original announcement was limited in terms of the number of things that you could pull down, right?
Maile Ohye: Yes. We have a lot of really great releases coming up, and we are trying to get that out to users as quickly as possible, to get that feedback. So, we think the sooner we can get you the great tools, the sooner you can start to build better sites and get that information.
Maile Ohye: So, rolling this out was a necessary step. It’s a really big bang for our first release, because it also helps us reach the webmasters through the hosting companies as well.
Eric Enge: What about being able to extract the link data?
Maile Ohye: This is definitely on our radar. This is a great first step for us in June, but we are looking to see how else we can help webmasters.
We see huge possibilities out there for the advanced SEOs who could get more of this data. Think about the ways they could use it to do analytics to figure out more about websites.
We see this is a big opening, and so much creativity in the community could be used from this. So, it’s definitely on our radar.
Eric Enge: Okay. So, one thing that I noticed recently is that there is the ability now to designate an RSS feed as a sitemap file. And, if you go into the interface, it will actually tell you what your RSS feeds are and ask you if you want to make them a sitemap. Can you talk about that a little bit?
Maile Ohye: Sure. This is part of our effort to reach more and more webmasters, and to make it simpler for you. The RSS sitemap is already rich with a convenient format that we can extract links from. So, if you’ve already got one we can detect that and let you know that we’d like to take some of that data to help us understand your site better.
When you submit a sitemap, you can also get more indexing stats about your site. So the URL is extracted from your sitemap, and it will tell you the percent of URLs in the index and things like that.
Eric Enge: Yes, interesting. Discovering a URL and a sitemap doesn’t mean you are going to index it. I think most people understand that, but it doesn’t improve the chances of it being indexed.
Maile Ohye: I wouldn’t say it improves the chances of it being indexed, I would largely state that the chances of it being indexed are based on a lot of factors, like the content that’s on the page, and the importance that we place on people linking to that. What it does help with is discovery of the URLs.
Eric Enge: Sure. So, if you have a URL that wasn’t being discovered, it’s great. But, if it’s already been discovered then you are probably already in good shape anyway.
Maile Ohye: Yes, more likely. But, you can also provide extra information in a sitemap file. You can give us information about your canonical version and also the changed frequency of it. So, it’s still really good information to us, and the more we understand and trust your site for this information, the more we can use it. So, this is something that’s improving in functionality within Google over time as we get more and more signals.
Eric Enge: Right. So, back to the RSS thing for the moment; if you do agree to have an RSS feed be a sitemap file, does that tie into the whole notion of query deserves freshness at all?
I ask this because RSS feeds imply time based data. I was just wondering if it might increase the speed in which something is indexed when it’s delivered through an RSS feed rather than as a static page in a sitemap.
Maile Ohye: Thinking about how our pipeline works, I would say that it’s not going to improve that speed. There is no added weight because it came from a source RSS.
Eric Enge: Okay. So, let’s switch gears a little bit. Setting a faster crawl rate is a great way to help people dial back Googlebot from hitting them too hard. If you up your crawl rate; does that potentially increase the number of pages you might take?
Maile Ohye: Yes. I wouldn’t say that we necessarily come with any exact numbers of pages that we are going to crawl. But, I do think the crawl rate setting for us is useful, because it tells us what type of load you can handle. It just eliminates one barrier that might not have made it crawled as fast as possible.
The option to set it faster isn’t always available, but if it is, it’s not that we are necessarily going to crawl more of your pages because we can crawl a thousand in a faster time, or anything like that. If we had calculated through our algorithms we would have determined that you couldn’t handle that load, but now we think you can, because you have said that you can. It eliminates a barrier that we might have had in crawling your site.
So, I see it as a good opportunity to make sure all your ducks are in a row when it comes to Googlebot. Just maximizing all of your chances.
Eric Enge: Right. I understand. Okay, so with spam reporting one of the new things I noticed is that you provide confirmation of spam report submissions in Webmaster Tools. I am not sure how long that’s been around, but when did that start?
Maile Ohye: Sure. The confirmation of spam reports I believe started earlier this month. We want to get the spam reports, but at the same time we keep a lot of this information confidential within ourselves. We want to acknowledge that we are receiving them. We are looking to give more helpful messages in message centers in general. Another one we rolled out that wasn’t widely announced was the Chilling Effects messages. You know how you can file a DMCA request to have content taken down from a certain webpage?
When that request is approved, then we will take down that content. But, also because we have some verified website owners for some of these sites, that DMCA request effects your site, and you are verified in Webmaster Tools. You can also get a message there notifying you of what the request was and what we’ve done on behalf of the request.
Eric Enge: Hopefully that will inspire people to remedy the problem.
Maile Ohye: Yes, exactly. It also just shows you the impact of what’s happening. We want to give webmasters more visibility as to what’s going on, so that they can help their site. This is another one of those opportunities for us.
Eric Enge: Right, so then next question for paid links that get reported in Webmaster Tools. I think it’s fairly well stated that the primary thing you do with it is use it to improve your algorithms. But, if you do confirm a link is a paid link when you get the report, is it normally disabled from passing PageRank?
Maile Ohye: Yes, we do disable such links from passing PageRank.
Eric Enge: Next up, I want to talk about a technique for generating large sites. For example, there are all kinds of different ways to generate large sites that are database driven. It’s really easy to create, in the process, a site that has absolutely no value at all.
But, of course, it’s also possible to approach that quite differently, and still have a data driven site that creates a lot of value. I am just curious as to how you look at that, or how you try to interpret those things, because the difference between the two scenarios could be pretty difficult to determine?
Maile Ohye: I think again there is going to be a large focus on the user. So, if you have structured data or anything that’s coming from a database, that can be good information. It’s something that we can index, so there is certainly no case for us to not index any of it. When you have this much data, people usually think about user’s searching. They are going to want context behind it.
So, as you are saying, what is the value add for it? If you are talking about getting a lot of data and just making a large website, this needs to be in context for the users for it to be useful for them. So, I would look to make it value added as you are saying.
Eric Enge: Yes, absolutely. But, sometimes the value add can be pretty subtle. You may have unique data where essentially a lot of what’s being presented is tables of numbers about various things.
The value is there, but it’s really hard for an algorithm to tease out that there is actually something different and of value here. How do you try to evaluate that?
Maile Ohye: Yes. I would get back to the fact about putting it in context. If it has real value, many people will link to you for it; more people will come to that search result. But, a lot can also be determined by the audience that you are also garnering for that information.
Eric Enge: Indeed. Alright, let’s talk a little bit about Flash. You’ve recently announced some changes designed to make Flash Indexing better?
Maile Ohye: Yes, we have. Again, this comes from our focus on helping users. There are a lot of Flash sites out there, and Flash embedded within sites. And so, we really want to focus on getting more of this quality information to our users. We have two engineers who’ve worked really hard on this, Ron and Janis, who have actually worked with Adobe as well.
Now for Flash, we are executing the Flash application as a user would, and we have to go through there and find the important states of when something is happening, or when there is a URL being shown, or when it’s something that the user can click on. We are taking that text, and it’s being added to the page and being shown as search results. If we are finding links within the Flash, we are actually pushing that back into our crawling pipeline. So, it’s a full integration that really results in the user getting more accurate results and better titles and descriptions as well.
Eric Enge: So, will a link embedded in Flash pass PageRank?
Maile Ohye: Yes, it functions as a regular link.
Eric Enge: Excellent, so part of this update you added support for SWFObject?
Maile Ohye: Yes. There is a lot of Flash loaded through SWFObject.
Eric Enge: Right. When you use SWFObject you have the ability to present text which is supposed to be a representation of what’s in the Flash. But, it’s not physically tied to what’s in the Flash, correct, so the publisher could put something different in there.
With sIFR, the text that the crawler sees is exactly what’s used to render the font in a direct sense, whereas with SWFObject there is the ability to make some difference.
Maile Ohye: Yes. We do want all text to be accurate to the image itself, so that Googlebot will see text that it is either identical or a representative subset of what a user would see with the Flash application.
The same rules apply whether you use SWFObject or some other technique.
Eric Enge: Right. So, the one scenario which is an interesting one to think about is if you create a synopsis of what is in the Flash. The words may not be identical, but it is an accurate synopsis. Are you on okay ground then?
Maile Ohye: Yes. I think the important thing is in a lot of these cases, if it’s creating accessibility and it’s accurate to that content, then you are fine. But, I want people to understand that it needs to be identical or very reflective of the content itself.
Eric Enge: Right. Obviously the further and further you drift away, the riskier it gets. Because, at the end of the day, whether it’s a human or an algorithm judging it, the algorithm is still created by a human. You might have done something that’s perfectly decent and clean, but it’s conceivable that you can get into trouble just because you diverged in some way from what was in the Flash itself.
Maile Ohye: Yes. Our intent is not to penalize people, but to show them that they are putting themselves in a riskier position.
Eric Enge: Yes, I understand. You take risks as you do things that are harder for the program, or even a human reviewer, to understand.
Maile Ohye: Yes, thank you. Most of the time, we don’t think of them as doing the wrong thing, but we need them to know that there is a risk involved when it’s not identical.
Eric Enge: So, since these recent announcements improve indexing, should everyone go build all Flash sites?
Maile Ohye: Flash can be very spectacular, but I think that’s something that you have emphasized, and we hear a lot about this. But, we still have key concepts of accessibility. So, while Google is now able to better index your Flash content to users, there are other major search engines out there. There are also people using mobile devices that may not be able to see your all Flash website, so it still important to think about what we call progressive enhancement.
The term of the industry where you have great content and navigation built with just text and static HTML. And then, from there you can add and embed things like a Flash application, Ajax, or different things that users really like. While we’re happy that we are helping users with the Flash content, we don’t see it as a green light for everyone to be eager to do this; accessibility will still apply.
I believe you have as well, and we do appreciate that. So, while we are excited about these improvements, at the same time we also want to make the web better for everyone. And so, we are helping some users, but still accessibility helps everyone.
Eric Enge: Back to Webmaster Tools for a moment. I see, you have the ability to disable a Sitelink? If you disable a Sitelink, does it get replaced with another one?
Maile Ohye: I don’t think so.
Eric Enge: It is still a useful thing. If you have something that is inappropriate to be presenting at that level, then you can take it out of there, so that’s a good thing.
But, you could envision taking suggestions from publisher about what they would like to be their site links. And then, Google evaluating the merit of those suggestions, and taking them if it makes sense and not taking them if it doesn’t, right? It gets a bit more interactive at that level, and the webmaster begins to be able to impact the search experience when related to their site.
Maile Ohye: Yes. We are very interested in this two-way communication. Right now our focus is still on the users. Sitelinks have been useful, and it’s formed algorithmically, because we’ve found that it helps users this way. But, over time, yes, I definitely think the intent of serving the users by getting more of that communication from webmasters would be terrific.
Eric Enge: Alright. Can you talk a little bit about the Geographic Target Option and what it does?
Maile Ohye: Sure. The Geographic Target option is especially useful if you have a top level domain like a dotcom, and you would like to target a verified subdirectory or sub-domain to a particular region. So, if I have example.com/Canada, then I might have that regional specific content there, within that subdirectory.
I can actually target that to the geographic location of Canada. Once we have that information, we can use that to provide better results for searchers who want more Canadian information, whether it’s based on the fact that they are in Canada, or they are searching for something specific to Canada.
That’s a great area where getting webmaster input allows us to improve search quality.
Eric Enge: Right. So, if you are in Massachusetts, but I do a research for Toronto Pizza places, I would still potentially see the content. It’s just an input basically as to what the page is about.
Maile Ohye: Yeah, it’s an additional signal where that geography actually matters.
Eric Enge: How granular can you get with the targeting?
Maile Ohye: At this point, it’s limited to specifying a region.
Eric Enge: Okay. So, if you had a product line that you were offering through Europe, you could target the information on your site about Germany to German users and so on?
Maile Ohye: Yes. I am not sure about listing each country. But Webmasters often target through languages or they target through geographic location. So, in the case of languages, French can be spoken in different areas. So, you might not want to target that content. But, if you are targeting your users and if it really does make sense for you to target them to a specific geographic location, then that’s what the geo-location feature can do, and it helps us to serve better results.
Eric Enge: Right. Anything else that you can tell us that is up and coming or new in Webmaster Tools?
Maile Ohye: I would say that we are always making constant improvements, and we are always excited about the stuff that’s going on here.
Eric Enge: Thanks Maile!
Maile Ohye: Thanks a lot, Eric!