The following is the transcript of an interview of Mark Lucovsky, a Technical Director of Engineering at Mountain View, California-based Google, Inc. Mark has held this position at Google since November 2004.
Before joining Google, Mark was a Distinguished Engineer at Redmond, Washington-based Microsoft Corporation. While at Microsoft, Mark was a founding member and principal architect of the 32-bit Windows team, where he designed and coded major portions of the Windows operating system.
Prior to Microsoft, Mark designed and coded operating system internals for a variety of UNIX and proprietary OS-based minicomputers, minisupercomputers, mainframes, PCs and workstations.
Eric Enge: Let’s talk about the new Google Ajax Feed API.
Mark Lucovsky: We’re trying to bring our APIs down a level so that they can impact more people. We’ve had great success doing that with the Search API, and doing it with feeds is a totally natural extension, with virtually identical programming models and a virtually identical customer skill set. The only thing different, really, is that the data isn’t just coming from search results. It can also come from RSS feeds, but other than it’s just the same basic model.
Eric Enge: So the data isn’t even necessarily from Google.
Mark Lucovsky: Yes, exactly. The way that I like to think of it is that our Search API is delivering feeds of search results. We dynamically deliver it as a bundle of results at the right time. With feeds, other publishers have already decided this is the information they want to make public. It could be earthquake-magnitude data, it could be top songs on iTunes, it could be their blog, it could be weather data, finance data, new stories, whatever–they have decided to make it public, and they published it in a variety of formats, and our job is to help deliver that data easily to anybody who wants to consume it.
Eric Enge: Right, and, make it easier for them to mash it up.
Mark Lucovsky: Yes, it’s the idea of bringing mashup programming within reach of everybody, from the hobbyist to the professionals. Previously you had to be fluent in XML, you had to understand what an XML name space was, you had to be able to deal with server-side programming to proxy the data, and you had to be aware if you were a high-volume site, what you might be doing downstream to the feed providers, the guys who are providing you with data. You really had to be a professional programmer if you were going to succeed in doing a mashup based on feeds. With the Feeds API, we have taken a lot of that heavy lifting off of your plate: we’ll go access the feeds from the feed provider, we’ll do it through our cache on a schedule that’s acceptable to them, we’ll normalize the data, we’ll clean the data, and we’ll deliver it in a way where you can easily make use of it.
Eric Enge: Before this API you had to be able to parse the XML for the RSS feed, probably do a substantial amount of string processing, figure out what was in the content, and then manipulate it from there. And you might be dealing with quite different feeds, right?
Mark Lucovsky: Yes. I think you hit the nail on the head by saying they have to look at the RSS feed. What if it is an Atom feed? What if it is an Atom 0.3 feed? What if it is an RSS 2 feed versus an RSS 1 feed, or RSS 0.94 feed? We make dealing with this easy for everybody. For instance, if you want to access blogs, and you want to basically display blog snippets and blog titles and hyperlinks to blog articles on a page, you can do that on anybody’s blog, and you don’t have to worry about what type of feed is available. What we are saying is if you want to integrate blog snippets on your page, if you use this API you can be ignorant about the feed format that the different blog services deliver.
Eric Enge: You are virtualizing all the various levels of all the various feed protocols, so you just don’t have to cope with any of that.
Mark Lucovsky: Yes, but we are also making it very easy to extract popular elements like the title, the link, the snippets, the content, the author, and the date. We’ve normalized those to make it very easy to access. But at the same time, if you want more specialized information, like the number of Digg votes that a particular article has had, or if it has a piece of embedded media like a podcast, you could still get at that property using normal XML techniques. So, we have made the easy things very, very easy, and we’ve made the stuff that was previously hard very approachable and accessible to somebody who has incrementally more skills.
You have used tools like our Video Bar Wizard?
Eric Enge: Yes.
Mark Lucovsky: For feeds, we have a whole program in place for things like that. So, you might have seen the iTunes control that shows the top ten songs. We have built cut-and-paste controls for selected feeds. We’ll continue to do that to increase the reach of this API in very targeted areas in the same way we did the Video Bar and the Map Search control for this Search API.
Eric Enge: Let me give you an example. Actually it pertains to something that I do on the Stone Temple Consulting site right now. Because we have a blog at www.stonetemple.com/blog, and on the homepage there is a display of the four latest blog posts.
Mark Lucovsky: If you visit our AJAX Feeds API main documentation page (http://code.google.com/api/ajaxfeeds), notice the center column is exactly what you talking about: it’s recent articles from our blog, integrated under that documentation page. That’s done using the Feeds API.
Eric Enge: Yes, I see that you have got a little summary of your five latest posts.
Eric Enge: Then I just need to substitute in the correct feed name.
Mark Lucovsky: The next step for us is to take that and turn it into a wizard, the same way that we’ve done with the News Bar and the Video Bar. That will be a wizard where you enter the blog URLs that you want to watch, and we will generate a snippet of code that you can put anywhere on any site, on any page. We look at that as addressing the needs of the hobbyist, the webmasters, and the professional developers, all in one. The professional developers might take our generated code and say, well, that’s a starting point for me, thanks for doing this.
That’s kind of our model for feeds, and search, and everything that we do is to take those common scenarios and make sure that the API makes all of that stuff very accessible.
Eric Enge: When we talked last time about the Ajax API, one of the things that you didn’t want people to do was to reorganize the results, because you obviously didn’t want your search results to be messed up.
Mark Lucovsky: Yes. We don’t want a website to misrepresent data that we provide. We wouldn’t want a site to change the order without telling the visitors if they change the order. We are also trying to prevent malicious reordering where someone is trying to trick users into thinking something is what it isn’t.
Eric Enge: You don’t want them making representations on Google’s behalf.
Mark Lucovsky: Right. So the typical ordering of a feed is by publishing reverse chronological order. It’s kind of the expected norm that when you visit a feed, that’s the order that you are going to see, and we try to preserve that. That order is really a function of the publisher of the feed. Sometimes the time stamp is not updated properly, and sometimes the order is something that we can’t even determine, and we don’t even know what the correct order is. For Apple’s iTunes feeds, some of them are ordered by rank, and rank is a public element inside the iTunes data. So I don’t think that reordering feeds is as problematic as it is in search results. But I think it is a function of the feed.
Eric Enge: Yes, for example you could look at a dozen feeds, and you could create a mashup on a contextual basis. You could cross-reference data from different feeds and create new compound items by taking one from feed A, and two from feed B, and one from feed C, and none from feed D, perhaps because there is a simple keyword or key phrase match across all the feeds you are monitoring.
Mark Lucovsky: Sure. But, I don’t think our Feeds API is really designed for that scenario in mind. I mean our API is really designed to consume a feed and do something with it on your page.
Eric Enge: I noticed every example you provide has specifically put up four items. Is that a limit?
Mark Lucovsky: No, that’s programmable.
Eric Enge: The documentation makes a special point of talking about providing results in JSON or XML format. Can you talk a little bit about the significance of that?
Eric Enge: Right, so there are seven elements that you can get while staying out of the XML parsing game altogether.
Mark Lucovsky: The blog example we talked about before is a great example, where JSON is all you need. If you were trying to do something like the iTunes example that we built, there we are showing the Apple iTunes cover art extensions–those are available in the XML properties, so we are picking up those pieces directly out of the XML. We are doing kind of a hybrid form in that iTunes sample.
The goal is to make the easy things very easy and the hard things still easy and very possible.
Eric Enge: You want to have this useable by people who are not really professional programmers, and they get what they need to get done simply, and when it gets more complicated you can basically extend into those areas.
Mark Lucovsky: Right, the documentation has the three samples that are designed to show you pure JSON mode, pure XML mode, and mixed mode.
And we’ve even done some cross-browser XML normalization to make it very easy to deal with in a cross-browser environment. So we compensated for the fact that Internet Explorer doesn’t implement the W3C DOM, so we built a helper function and made it work well on Internet Explorer as well.
Eric Enge: That’s very cool. So, are there other aspects of this that I haven’t covered with my questions that we should make people aware of?
Mark Lucovsky: What we are really trying to do is help everybody create better web applications. With search one of the things that people are trying to do is deeper integration of search data within their applications and pages. Feeds are another big repository of data out there that were previously kind of hard to access, hard to mashup onto your site, and hard to integrate into your site. This was definitely the next step for us, to deliver that same ease of use for feeds that we have already delivered for search.
Eric Enge: Well, that’s excellent there, I think this is pretty neat stuff. Thanks for taking the time to speak to us today.
Mark Lucovsky: Thank you!