Josh Cohen is the Senior Business Product Manager for Google News. He is responsible for global product strategy, marketing and publisher outreach for Google News, which is currently available in 26 languages and more than 50 countries. Prior to joining Google, Josh was Vice President of Business Development for Reuters Media, the world’s largest news agency. While there, he led business development for Reuters’ Consumer Media team, including all activities with major strategic partners. He was responsible for agreements with AOL, Google, MSN, Yahoo! and numerous media companies around the world for content distribution, revenue generation and strategic investments.
Before joining Reuters, Josh was Director of Business Development for SmartMoney.com where he led business development and licensing activities for the site, a joint venture between Dow Jones and Hearst. Cohen holds degrees from the University of Michigan and Columbia Business School, where he graduated Beta Gamma Sigma.
Eric Enge: Can you tell me what your responsibility is within Google?
Josh Cohen: I am the business product manager for Google News. I work with other folks on the news team, on figuring out what is our roadmap, what are the features that we are working on, what we want to do with the product in the next 6 months, 12 months, 18 months, and so on.
A big focus of my job is really working with people outside of Google; so talking to publishers, talking to people in the media and at conferences; just putting a face on Google News and trying to demystify it as much as possible. I also work with a lot of the different cross-functional teams who interact with publishers on a day-to-day basis and try to tie those efforts together a little bit better.
Eric Enge: Tell us what Google News is and what it does, and who uses it.
Josh Cohen: Google News was launched in beta back in 2002. The idea behind Google News is really similar to what we are trying to do in search. Not to throw the company mantra at you, but, it really is about organizing all the news information out there and making it even more accessible and useful for users.
We are trying to do this in every single country, and in every different language. We want as many different sources as possible, so that when people are looking for that information, they can find it. The interest in news overall is probably higher than it has ever been. More and more people are getting this online, and so the challenge is trying to find that information and to provide some context and organization. So, we really are operating as a search engine specifically for news.
Eric Enge: How do you define news versus other types of content?
Josh Cohen: We really try and keep as much as possible as black or white, and we don’t get into qualitative discussions about the nature of the news site. We don’t include any hate speech and pornography. What we look for is whether or not the site is covering current events, is it specifically covering the topics of the day, is there some evidence of an editorial organization, is there at least some editorial review process before something actually gets published. But, our bias is really toward inclusion.
Eric Enge: Right. So, you try to be as broad as possible and include as many different sources as you can. Are you looking for the content that would be unique, rather than somebody just republishing stuff off of a news wire?
Josh Cohen: Absolutely. We don’t have people who are just pure aggregators; there needs to be some original content on that site.
Eric Enge: That makes a lot of sense. What is the process that people go through when they want to have their site or some portion of their site considered for Google News?
Josh Cohen: It is actually pretty straightforward. There is a whole help center on Google News that is specifically for users and explains to them how it works. A whole portion of that is dedicated specifically to publishers, which explains to them how it works, and how to submit their content. Ultimately, they simply submit their sites or the portion of their sites that they’d like to be reviewed for inclusion, and then we take a look at it.
Eric Enge: There is a form people can use?
Josh Cohen: Yes. It is located here.
Eric Enge: What type of questions are covered in the form?
Josh Cohen: There are a few basic questions about the organization itself. We do not make editorial judgments about the nature of the site. It is really up to the user at the end of the day to make those decisions about whether or not they think it is a site that adds value to them. So in the form, we are looking for objective information about their site, and we are not looking for them to make a pitch about their site.
Eric Enge: Evaluating whether it is unique news content is something that your reviewers just do.
Josh Cohen: Yes, there is a support team that will review those sites as they come in. There is not a single editor or journalists who are working on Google News. Once the site is included in Google News, and included in our index, there is no manual intervention around the rankings. It is all done algorithmically.
Eric Enge: Right. Yes, but the people who review the site check to make sure that it is unique content as opposed to duplicated.
Josh Cohen: Yes, they ensure that it meets that criteria. A lot of that can be done algorithmically. We understand duplicate content, and we can do a full-text analysis. But yes, there needs to be original content.
Eric Enge: Right. And you know, for some reason, something goes wrong in the process, and the site does get turned down, but the publisher thinks that there is a fit, and they really believe that they should be reconsidered. Is there a process you would suggest for that?
Josh Cohen: Our bias is towards inclusion, so if there are things that we miss, we certainly want to be able to understand the site better.
Eric Enge: I know one example of a site that got turned down, and it turned out what happened is that, the person who had reviewed it had not looked at the news portion of the site.
Josh Cohen: That is really why we try and ask for as much information about their site as possible, because obviously the webmaster, the owner of the site, the publisher is going to know a lot more about it, understands the details of it. We are looking at thousands of different sites, and so that is the one real manual part of Google News; so the more information we can get about this site, during that submission process, the better.
Eric Enge: We have heard things about other kinds of requirements, like there needs to be a certain volume of news for example.
Josh Cohen: No. There is not any a volume requirement in terms of number of articles published a day or something like that. It can certainly have an impact in the rankings, but not in terms of inclusion or not. We have sites that are publishing hundreds of articles on a daily basis, and we have others that are longer analytical pieces or investigative pieces that are publishing just a handful a week. So, there is really a pretty wide range.
Eric Enge: There is also the notion that the URL needs to have a 3-digit code on it.
Josh Cohen: That is correct, there are certain technical requirements, which have nothing to do with the nature of the site, but the ways in which we can pickup that content. The 3-digit identifier is one of the ways we pick up the news content on a site. As you mentioned, there are sites that have a section that is devoted to news, but maybe the rest of their content is inappropriate for Google News. Oftentimes, in those sites we will see that that 3-digit identifier is a way for us to pick up the specific news content, so that is a requirement for crawling that content.
However, when sites are included in Google News, they are able to submit a News Sitemap, and if you are submitting the News Sitemap to us, then we don’t need the 3-digit URL requirement anymore, and you can ignore that if you are submitting the content via sitemaps, as we can pick it up that way.
Editors Note: Since this interview took place, Google News Sitemaps went through an update into a new format.
Eric Enge: Do the sitemaps bring any other kind of specific advantages?
Josh Cohen: Yes. It doesn’t change the ranking; there is no bias towards a site that submits a site map versus one that doesn’t. The real benefits of submitting a sitemap are, it provides a greater level of control over which of the articles appear on Google News, and it allows for specific metadata to be communicated about each of those individual articles.
Right now it is fairly limited, but we are certainly looking to expand what we do within sitemaps, because the more information we have about a publisher’s site, the better. For individual articles there can be basic stuff like attribution, and bylines, and location, and so forth. Ultimately, sitemaps are a really good way to clearly identify the information that you want to get crawled.
Most questions that a publisher will have around ranking of their content on Google News boils down to some a technical issue; where we didn’t take up an article or when we try to crawl it, it failed the extraction process. So, sitemaps is a real good way to insure that we are crawling that content, and it also allows you to proactively address any of those issues, because you can go right in, you can see when we are having problems crawling your site, whether it is a technical issue on our side or your site. I won’t say sitemaps eliminate all the technical issues, but it can certainly it can limit the impact of some of those, and allows you to have a better way of monitoring them.
Eric Enge: It will reduce errors, and will not affect ranking of included stories. It can affect whether or not the story is included at all.
Josh Cohen: Yes, exactly. And, that is a pretty big difference.
Eric Enge: Yes, it is. Are there other technical issues that people need to be concerned with to make sure that their news articles are friendly to the Google News crawler?
Josh Cohen: There are definitely challenges with images; so there are certain best practices that we try to encourage publishers to do. Larger-sized images with good aspect ratios are always easier for us to pick up; having more description within the captions is always helpful, having them near the title, having them inline and non-clickable. And, for the most part we prefer JPEGs.
Another thing is to have relevant and useful titles that are going to help the readers and to help our crawler know what your page is about.
Try not to break up the body of the article into multiple pieces (Editor: more advice on the treatment of the body can be found here). Also, include a date between the title and the body in a separate line of HTML, as this makes it easiest for our crawler to determine when the article was published. These are tips that are not just specific to Google News, but certainly help for Google News.
Eric Enge: These things can also influence click-through.
Josh Cohen: Absolutely.
Eric Enge: Who are the people who consume Google News?
Josh Cohen: The focus of Google News, and I think one of the real appeals of it, is trying to offer as many different perspectives as possible on a given story. So, it can be a different political perspective, different geographical perspective, and you have different people who want to understand a story and all the different angles around it, and they really want to delve into a story. And that is why we cluster these stories not by sources, but any request of the articles by story. People click on a bunch of these different links and those are the people who by and large get a lot of value from Google News, because they get that diversity from Google News.
Eric Enge: From our experience, that certainly includes reporters and editors from a variety of sites.
Josh Cohen: They are certainly heavy users of Google News. There are those who will come to the front page and like the fact that we will aggregate the top stories out there on the web, and allow them to browse the top stories, see what is there, click on them, and go read them on the publisher’s site. Looking at those top stories is not dramatically different from somebody who may go to the publisher themselves directly to look for those top stories.
They may be just looking to see what is out there from across the web, from both their favorite sources and sources they don’t know. Then, there are the other half of the users who are using us pretty specifically as a search engine, who are using us just to type in the keywords or news stories that they have heard; whether they have heard it in the office, or on the web, or somebody emailed to them want to learn more about it, and they will just type in a name or few keywords, and use it much more as a search.
Eric Enge: People also set up news alerts, right?
Josh Cohen: Absolutely. They can set up alerts, use our RSS feeds, so there are a number of different ways where they can try and keep on top of stories. We see our role not as a destination site, just as a starting point. Our goal, very similar to what we are trying to do with web search, is to help people find what they are looking for and then send them on their way.
Eric Enge: One of the subtleties of this is that it is obvious to have a title that entices a click-through. But then, you also want that title to whatever it is that the editors you want to reach use as search terms.
Josh Cohen: To be clear, having a clean title matters, and the placement of that title in your page matters; but there are a few different elements that we are going to look for in trying to pickup the correct story. Certainly, the title matters, but URL and most importantly the text in the article itself matter too. If you have got a URL that is somewhat unclear, or the information is not that clear in the body of the article itself, then the title takes on more weight.
These are all different components that we are looking for; so if you have got a URL that has information, the text is very clear for us; then the title I would say is no more important than the other ones.
Eric Enge: Are there other things that go into ranking news stories?
Josh Cohen: Yes. There are two separate ranking processes that take place. One is just the story ranking, such as what is the top sports story of the day, what is the top entertainment story of the day, science and technology, and so on. There are a number of different factors that go into that, but the easiest way to think about it is we are really relying on what editors think the most important stories are. What is the aggregate editorial interest in a given story: that is to say, how many people are covering it, and where are they putting it on their page? These factors do not impact an individual source’s results, but do influence what story lines we think are most important. So, that is the story ranking
For article ranking there are a number of signals that we are trying to use: is it original content, is it timely, is it relevant, is this a local story, and there is a local source reporting original content on it? That is again, not always relevant to every single story, but it is something else we will look for. Other questions we ask are, is it novel, or is it just a rehash of an article that was out there before, a story that somebody else broke, you just happen to publish it later. These are things that we look for, hard to do, but increasingly something that we are trying to include in our rankings.
Then, there are also source-specific signals that we try to use. This is where volume comes in: what is the volume of publication of original content in a given category? The example that I would like to use is, looking at the business category, you have got the Wall Street Journal, or Bloomberg, or Reuters, all of whom, any given day, are publishing probably hundreds of original stories in business. By itself, that is a decent signal that this is a quality source in that category.
You can compare that then with their volume of publication of original content in the sports category, you are probably not going to see a whole lot, if any, of original publication there.
I would say another really important signal for us in recent quarters has been the user behavior. Their behavior has become a really helpful signal for us in trying to determine that same trusted quality of a given source. So in a given cluster, the first link will get the most clicks, the second gets less clicks, and the third, the fourth, and so on, keep getting fewer and fewer clicks. But, if you look at a user who comes in, and instead of clicking on that first link which is what they were “supposed to do,” and instead let’s say they click on the fourth link; that is a very strong signal about both the source that they clicked on and also the three sources above it that they didn’t click on, even though they were “supposed to” click on that.
Over time, as you aggregate that information, normalize it for different click positions, you can look at this section-by-section to get a sense of what users feel are the best sources in given categories. Again, sticking with the business example, if I have got some random source as the #1 link in Google News, and Reuters in the #3 link, somebody may come to that and say “Wait a second, this is a business story, I want to see what Reuters has to say, I am clicking on that link in the third spot.”
That type of behavior takes place again and again, and it has become another important signal. Now, that doesn’t trump everything else; all these other scores and factors still matter, but all things being equal, we certainly want to take a look at some of the qualitative aspects of a source. We try to algorithmically determine the qualitative nature of a source in addition to the story-variable signals.
Eric Enge: Are inbound links a factor?
Josh Cohen: Not really. It is obviously a signal on the search side of things. With PageRank links certainly, as you know, are an important factor. On the news side of it, just because the nature of news and how quickly that information comes out, to be able to build up links over time is just something that isn’t really all that applicable on the news side of things.
Eric Enge: What about social media signals, such as Twitter?
Josh Cohen: There is nothing specific I can say on those, but I think it is safe to say that we are always looking at new signals. We will always keep working on this, because it continues to remain imperfect. We will test certain ones, and we will do evaluations against them as we did with the user click behavior.
Eric Enge: Anything you can say about plans for Google News?
Josh Cohen: We are trying to experiment in a number of different ways. For example we launched Fast Flip two months ago.
With Fast Flip we tried to introduce that element of serendipity that you get in the offline world. When you pick up a paper and you see the top stories, you may spot the article at the bottom of the page. It is something you would never think to read, you would never really look for, but you do because you spot it.
How do you introduce some of that quality into the online experience? Fast Flip is an attempt to do that. Another key component to that is the speed with which you can browse those pages. If a page takes five to ten seconds to load, you are not going to want to explore different types of content. Fast Flip is an attempt, both in terms of how it is presented visually, and also the speed with which it loads, to allow you to introduce some of the best of the offline experience online. That is a good example of one of the things that we are experimenting with; and I think we like to keep trying to innovate and figure out ways in which we can help our users and work with our partners.
Eric Enge: From my perspective, for a publisher looking to get exposure for what they are doing, implementing a quality-relevant news feed and working with Google News is an outstanding opportunity. I mean, you get visibility that a lot of people would die for. Of course there is an expense in implementing such a news feed. You have to do a quality job, because you don’t want to get in front of people and then have them say this is crap.
Josh Cohen: I think that is well-said. The way that we look at it is that it is a real partnership with the publishers that we have. We are a search index, we are focused on news; but we don’t have any content, we don’t have editors, we don’t have any journalists, and we don’t create any information. We get that from the publishers. For publishers, we think that we bring value in helping them get found and driving the traffic to them. In a given month, Google News sends almost a billion clicks to publishers worldwide.
Eric Enge: Better still, a significant percentage of that is from news editors and bloggers. So, not only you are getting the traffic from Google News, but you are getting the possibility of being written about in other news environments.
Josh Cohen: Sure, getting written about by others within the market is interesting, but we also help publishers obtain loyal users, who may like the aggregation qualities of Google News, but will discover their content and like it.
Eric Enge: Thanks so much for taking the time Josh, to speak with me today.
Josh Cohen: Thank you!