Nathan Buggia is the Lead Program Manager for the Live Search Webmaster Center, Microsoft’s suite of tools designed to help web publishers get better results from Live Search. Buggia has overall responsibility for all the Webmaster Center Tools, Community and evangelism within the search marketing industry.
Previously, Buggia spent six years in Microsoft’s Server and Tools division, most recently as business manager for the Solution Accelerator Group. The division builds end-to-end IT solutions for enterprise customers in Security, Management, Systems Architecture and Interoperability.
Nathan has been working in various aspects of web technology since his first web dev job in 1997, working in PERL and CGI. Since then he has worked in Java, C++/CGI, ASP.Net, Systems Administration, Systems Architecture, and Bio-informatics.
Eric Enge: Can you start with an overview of what’s new in Webmaster Tools?
Nathan Buggia: The short answer to what’s new in the Webmaster Tools is pretty much everything. This is a significant update to what we shipped last November that provides a lot of really interesting data and resources for webmasters. The data and feedback we provide them is generally search engine agnostics so it should be applicable to all the major search engines.
Within the Webmaster Center, we have two features. The first one is an online community, and the online community is a set of a blogs and forums where we provide direct technical support to publishers as well as provide best practices and news around the community.
The second are the Webmaster Tools, and the goal of the Webmasters Tools is to provide a self-service toolset to all seventy-two million active publishers to help them understand how Live Search is crawling and indexing their website as well as how their website might be ranking.
Eric Enge: Certainly your goal is to help webmasters do better in Live Search, but people can use the information in Webmaster Tools to help them with all search engines.
Nathan Buggia: Yes, that’s correct. Our goal is to help provide support to publishers. We realize that search is becoming mission critical to publishers these days, both from the standpoint of just traffic going to their website as well as brand recognition. So how people are starting to navigate the web is changing based on search.
We want to provide support to publishers to make sure that if there is an issue that their site is having with a search engine, we can help them fix it. That way we get the best index of their site possible, so they get the best results possible.
Eric Enge: Can you talk a little bit about what the Page Score is that you show for the web pages listed inside of Webmaster Tools?
Nathan Buggia: Page score is a measurement for publishers to see generally how authoritative Live Search believes your content is. It’s not an exact metric, it’s a general metric. The way a webmaster should look at it is, if they have five green boxes that means they are probably doing well, they are probably in the top eschelon somewhere.It doesn’t necessarily mean you are at the level of Amazon or Ebay’s homepage, but you are generally doing pretty well and you should feel good about that. If your score is low, it’s may be five empty boxes, then that page or that domain has more work to do to gain authority.
Eric Enge: Right. Now, if a site gets penalized for some reason, does that affect its Page Score or is that independent?
Nathan Buggia: It’s possible, there are a lot of different penalties that can happen between a website and a search engine. Penalties are another area where we try to provide some transparency to the publishers as well. If you notice on the summary page of the Webmaster Tools, we have a feature call blocked. What we are surfacing here are quality based penalties, at both the domain or page levels. So, if your website has experienced any of these, you will see a ‘yes’ there or in the table below you, or you may see a yes aligned with specific pages. And if you do have a penalty, you can just click the hyperlink and it will be a form to request reevaluation. However, before you request reevaluation we highly recommend you go and take a look at our Live Search Webmaster Guidelines, which are linked off of webmasters.live.com, and make sure that everything you are doing in your site adheres to those guidelines.
Eric Enge: Right. The blocked indicator is actually quite interesting because the one above the list of URLs, I assume is for the site overall and then over here next to each URL is just identifying on a URL by URL basis.
Nathan Buggia: Yes, that’s correct.
Eric Enge: I assume that there is probably some latency when a new web page is created, you don’t know what its Page Score is yet until you have crawled it? Or is it something that you can determine as soon as you encounter the page?
Nathan Buggia: We hear a lot of that from webmasters. They’ll come and ask us “Hey, why is my page ranking so low?” And then, we go out and take a look at it, and it’s a brand new website that doesn’t have a whole lot of traffic coming to it yet. It may not have a lot of links yet, or it may not have a full set of content yet. Those are all things that we take a look at for the authority score.
The domain or Page Score is really based on some other factors. If it looks really small initially, then what you want to do is go out and market the site, go out and build the best content, talk to different people in the industry, and figure out what they need, what would get them to link to your website. And just make sure you have good, unique content.
Eric Enge: Let’s talk a little bit about the crawl issues.
Nathan Buggia: So the crawl issue is one of my favorite features, and one that we’ve spent quite a bit of time working on. What this feature does is it gives you access to a set of reports that show a set of issues that Live Search may have encountered while crawling your website. So, the first issue that we provide information on are 404 errors.
Anytime we encounter a page with an http status code of 404 we stop indexing the page, we don’t look at the content on this page. This can happen for a variety of reasons. Someone could’ve have misspelled the URL on their blog when they were linking to your website, you could have a broken link in your website, or you could have a content management system that is not returning the correct status code.
This has actually happened on microsoft.com on our MVP profile section. And we were able to use this tool to find the issue and resolve it. But there are a lot of reasons why this happens, and the downloadable reports allow you to hand it off to your IT department or your technical folks, and help them scope the problems so they know where to start.
Eric Enge: I really like the 404 report, because in the scenario you talked about, you can use that report and then generate a 301 redirect from the broken page, and the page returning the 404 to the page that was intended.
That will pick you up a link, because the link to the 404 page doesn’t bring any value to you. And I am looking at one here right now from my site and evidently someone has linked to the analytics study we did in ’07, and it’s a bad URL. So a simple 301 redirect could pick up the link.
Nathan Buggia: That’s exactly right, all those customers that might be coming to that link, wherever it was linked from, will now get the right article and you’ll get the credit and everything is great.
The next report is about URLs blocked by the robot’s exclusion protocol. Now, the robot’s exclusion protocol, as you know, is not a standard that was designed from the ground up, it is something that’s really evolved over the past ten years. It is complicated, and not always well understood by the industry. The most comprehensive article I have seen is out on Jane and Robot.
It really gives a good overview of all of the different functionality that the REP provides and then how to do it, and what the support is on the different engines. The robot’s exclusion protocol is very complicated. So what we have done here is provided publishers a list of all the URLs on their website that are blocked to us crawling based on the protocol.
What this does is it allows publishers to go and do an audit of their website. They can take a look and make sure that all the content that they want to be indexed is being indexed, and all the content that they don’t want to be indexed is appropriately blocked.
Eric Enge: Right, so the scenario here is, they have a section of their site that they don’t want crawled, so they use robots.txt to select and indicate to the crawler to not crawl that section. But, perhaps the way they specify the rule, actually end up setting down access to pages that they do want crawled. By seeing the itemized list that will become very apparent to them quickly.
Nathan Buggia: Yes, exactly.
Eric Enge: Alright, excellent; so let’s talk about long, dynamic URLs.
Nathan Buggia: This is also one of my favorite ones, and this is a report that no other engine offers. And really what long, dynamic URLs are, are all the URLs we have identified on your website that have too many parameters.
Eric Enge: And how do you define too many parameters?
Nathan Buggia: That definition may change over time. We change our algorithms, we may expand that out, or if we see some examples that we have too many numbers we may dial back.
What this report does is it allows you to always know what we think is too many parameters, without hard coding a rule into your system. So, the problem with long, dynamic URLs is that when you get all of those different parameters on the URL stream, the different combinations of those parameters tends to be a lot.
For example, if you have an ecommerce site, and that ecommerce site uses parameters to determine what the sort order of products is on the page, those could appear in any order on the URL, and still produce the same valid page. That could create a potentially infinite number of pages or infinite number of URLs that all result to valid pages.
That is a dangerous thing for a search engine crawler, because that could have us spending a lot of time crawling the exact same page with different URLs in your website. And that’s something you’d want to avoid.
Eric Enge: Right. And then there is the inverse of that, because if you are not going to let that happen to your crawler, it means you might not be crawled that well if you have these problems.
Nathan Buggia: Exactly. That is exactly how we recommend using this report, which is to just identify the URLs in your site that you might want to go take a look at and see if you can find a shorter, more economical version of the URL.
Eric Enge: What about unsupported content types?
Nathan Buggia: So Live Search has a wide list of content types that we will expose to users. And a content type is a defined in the http header of every page that gets downloaded by a search engine. What that content type does is it tells a search engine what is on the page. It will tell if there is just text html, or if it is a Flash page, a binary application that’s being downloaded.
So, in the interest of providing our users exactly what they are looking for, we generally only want to provide them things that they would expect to get from clicking on a link in search results. Unlike web pages and images, we don’t want to link directly to applications, Flash files, or things like that.
What this will do is give you a list of all of the pages in your website that either don’t specify a content type explicitly or specify a content type that we don’t support in our search results. Both of those scenarios will mean that we are not indexing the page. So, it is another potential audit for webmasters to just go and take a look and see if there is anything funny going on.
Eric Enge: What are some of the more common things that you run into that you are not supporting that people are implementing?
Nathan Buggia: So, a great example is, I was just doing some research on microsoft.com, and there is this one team that built a little dynamic image generating tool – I think they were building charts and graphs. On that tool they forgot to specify the content type of the image, and it just worked in their browsers. It worked in FireFox, it worked in IE and in Opera.
So they thought that they were done. But the problem is, search engines weren’t able to crawl those charts and graphs, and index them in their results. They weren’t specifying the content type in the header, so we didn’t know what we were downloading, so we couldn’t index it. So, that is a great example of how you could use it to identify potential problems.
Eric Enge: Let’s talk about back links.
Nathan Buggia: About a year and a half ago we removed the LinkDomain operator, much to the chagrin of the webmaster community. We promised we would bring it back. Well, a couple of weeks ago, we actually did bring it back in most of its full functionality.
So, what you can do is see all of the sites linking into your website, and then filter that based on the incoming domain. And that let’s you slice and dice those inbound links to get a better understanding of linking patterns, such as who is linking to you, and that can be useful in defining your future link building campaigns.
Eric Enge: There are a few ways I could see that being useful. One is certainly just allowing you to see who is linking to you more efficiently. But also if you discover that data and you are able to leverage it. You can communicate with others and say, “Hey, do you know the New York Times linked to us?” It is very beneficial to be able to say that.
Nathan Buggia: Yes. I was talking to a publisher at SES and they were mentioning that. What’s interesting about this is, they found it wasn’t just the New York Times that was linking to them. They looked all the way at the end of the links, exploring it all using the filtering tool, and they discovered the different sections of the New York Times that were linking to them.
It turned out, it was just a couple of the bloggers on the New York Times really seemed to like them and linked to them quite a bit. And that can be some really valuable information, because if you know who your fans are on the Internet, you can use that to garner more future links.
So, if you are going to give somebody an exclusive story, you might want to go through and find the people who have written about you well in the past and provide them an exclusive, knowing that you are more likely to get the good links back.
Eric Enge: Right. Now, if I want to download this, I can do that, but I think the download is limited to a thousand links?
Nathan Buggia: Yes. With the Webmaster Tools, we internally use an API that limits the results sets that we can get to a thousand. What we have done to work around this is, we have implemented the advanced filtering functionality that you see on pretty much all of our reports. What that filtering functionally does, is it let’s you zoom in to just the pages that you want within your website, and then download up to the first thousand of those pages.
Then you can analyze them and look at them in Excel or whatever data management system you might have built. That filtering functionality supports two levels of sub domains and two levels of the sub folders. So, between those four different levels, there is quite a bit of depth that you can scan into within your website.
The back link feature, and the out-bound link feature, filter the domains that are linking in, or that you are linking out to. For example, if you are Microsoft.com, you could go and take a look at digg.com, and delicious.com and see which of the links pointing back to you from those different sites.
Or you could even just look at dot com, dot mil, dot gov, co.uk, dot fr, and all the way down to the very top level domains. For the crawl issues’ report, the filtering does allows you to zoom in on a specific portion of your website to pull the errors just for that portion of your website.
Eric Enge: Is there a plan to remove the thousand item imitation in the future?
Nathan Buggia: We are always working on providing more, deeper access to the information. We can’t say exactly when we will be able to go beyond that limitation, but it is definitely something that we think about quite a bit.
Eric Enge: Can you comment more on outbound links and how publishers should use that as a weapon in their arsenal?
Nathan Buggia: The outbound links are really taking advantage of the link-from-domain operator that we used to support in Live Search, and giving you access to all of the URLs on your website that we found you are linking out to. Web publishers can just take a look at this and do a basic audit and say okay, are there sites here that I wouldn’t want to link to, or is this representative of my website or not?
Eric Enge: Well, one thing that strikes me that they would be able to do is look at it and try to find all those things with a blank page score they are linking to, to see if they are linking to something they don’t want to be linking to.
Nathan Buggia: Webmasters would definitely take a look at their outbound links and make sure that they are representative of the content on their website. For example, if you were peta.org (People for the Ethical Treatment of Animals), you may want to take a look and make sure that in any of your UGC (user generated content) areas, people aren’t linking out to NRA.org or other sites that your users might not be excited about.
Eric Enge: Right. Let’s talk a little bit about keywords.
Nathan Buggia: So, the keyword tool is a really simple tool that allows you to, for any given keyword, find out which pages in your website ranks most for that keyword. MSN.com, for example, is an enormous website that has everything underneath it from a full news magazine, to shopping, to Hotmail, to a portal and all that custom information. If you were to type in a term like digital camera, you would probably want the top pages ranking on your website for digital camera to be within the shopping portion of your website, not within the news portion of your website. So, what this allows you to do is to see within your own content, which pages are best performing for certain keywords within Live Search.
Eric Enge: That’s another example of a feature that could provide a lot of insight, which is search engine agnostic. Crawlers obviously vary from engine-to-engine, but there has got to be lot in common at how they look at pages.
Nathan Buggia: Right. What this really does is gives you insight into our dynamic ranking. How we translate an expressed customer need such as digital camera into a page that we think will satisfy that need. It gives you access to our information about a whole different type of ranking from Live Search.
In addition, if you take a look at the result, we also give you a good amount of metadata. So, when was the last date and time that we crawled that page, what is the relative score of that page, and is that page blocked, or are there any penalties levied against that page?
Eric Enge: Right. So, in this context, is the Page Score that we see on the keyword page relative to that keyword search or is it just a generic score?
Nathan Buggia: That is a base authority score that we give to every page in our index. And, it does not change based on whatever you type in the query box.
Eric Enge: Right. So you address the relevance type issues at another level?
Nathan Buggia: Yes.
Eric Enge: Can you talk a little bit about any example you may have about really interesting and novel ways that people have used Webmaster Tools.
Nathan Buggia: Okay. The first reason we see people using our tools is because they seem to be interested in the ranking features. Most of the people will go in and try and understand how their site is valued, or where we think authority is within their website.
The second thing that we see people do with our tools is drill into the information we give around crawling to try and understand the different issues with their websites. Now, all major websites have issues, whether they are 404 issues, or a page is blocked by the RAP. So really what people want to do is get a handle on understanding what the issues are across their website.
If you look at microsoft.com for example, they claim to have one billion unique pages. So if there are about twenty billion pages in our index, they have one billion and they want us to index them all and that’s a lot of pages. Of course we don’t index that much, but we do index a couple of hundred million of their pages.
So just going through and understanding any issues with a couple of hundred million pages are pretty onerous. So they will go through and use a different filtering functionality to go and scan their most valuable sub-domains, Support.microsoft.com, or TechNet or MSDN, for example.
They will use that information to uncover any big issues with their most valuable sections, and then they will download that data into Excel and they will use that as a scorecard, month- to month. So, they will get a list of the 404 errors every month of the different REP stuff, the different long, dynamic URLs. They use that as a way to understand the progress that they are making in addressing those issues.
Eric Enge: Right. They probably uncover one layer of problems, they deal with it, and then next time around they get to the next layer.
Nathan Buggia: Yes.
Eric Enge: Right. So, is there anything you can say about upcoming plans?
Nathan Buggia: Yes, absolutely. We are continuing to ship new features. You’ll probably see us ship more frequently than we have in the past as we build momentum. There are a couple of themes that we are going to continue building features against.
The first theme is really about providing transparency to how we rank and crawl our customers’ websites. So, you will see us continue to add functionality that helps you understand even more of the issues that we know about on customers’ websites. We will expose that in a way that is actionable for publishers so they know exactly what to do to make their website better.
The next theme that we are going to work on is around content management. One of the pillars of the Webmaster Tools is to help empower publishers to really manage their content within Live Search. So, we are going to continue the work on different features that allow them to better understand what content we have indexed.
It will allow them to automatically have content removed that is either copyright infringed or content that they had not intended to be indexed, as well as more ability to provide structured data and manage that data within our system. We have a couple other themes, but they are top secret projects at this point that we are not ready to talk about them. You will hear more about that in future, I am sure.
Eric Enge: That’s great. And what is the best way for people to make suggestions for Webmaster Tools?
Nathan Buggia: We created a feedback forum about a month that customers canuse to submit feature requests and we review those every week. We respond to every single comment that gets posted, and take the feedback seriously. That is very important to us.
Eric Enge: Right. Then you can see what seems to be most in demand and consider that against what you think is most useful, and how difficult it is to implement.
Nathan Buggia: Exactly.
Eric Enge: It seems to me that the focus on Webmaster Tools has grown quite a bit within the Live Search team of late. Is that a fair assessment?
Nathan Buggia: I think what you have seen is the Webmaster team continuing to build momentum. Back in November, we were just assembling the team and getting a Version 1 out, and what you’ve seen since then is us building a lot of infrastructure behind the scenes. We are very close to being fully staffed now, and we have just built the momentum of a development cycle. So, you will probably see faster innovation, and you will see more features continue to come out.
Eric Enge: Excellent. What are your thoughts about an authenticated way to report spam for Webmaster Tools?
Nathan Buggia: That is something that we have on our backlist of features that we would like to implement, we are very interested in doing that. So yes, you will probably see that at some point, although I can’t comment on exactly when.
We definitely want to make it easier and easier for the community to provide us feedback, and provide us information. We are also at most of the big tradeshows. We are currently at most all of the ones in the US, and over the course of the next year we are really pushing hard to be at all of the tradeshows worldwide.
If you don’t want to provide feedback on our forum, just come and grab one of us at one of the events. If you don’t talk to me, it will get back to me. It is a very tight community inside Microsoft.
Eric Enge: Thanks Nate!
Nathan Buggia: My pleasure!