Just over a year ago I did an interview with Pankaj Mathur, the VP of Sales for InfoGroup. I enjoyed the initial interview tremendously, so I really looked forward to the opportunity to do it again. As with the first interview, our conversation focused almost entirely on the local search data problem. What makes it hard, and some of the solutions that InfoGroup has in place to deal with it. If you are interested in understanding the complexities of extracting and presenting accurate local data, then the first interview, and this year’s interview with Pankaj Mathur is for you.
For those of you who want the CliffsNotes version, here are some of the major points that emerged from the interview:
- Pankaj Mathur: (in our data) “we have approximately 15 million businesses in the U.S., 1.4 million in Canada, and between 2.5 and 3 million in the UK. … “In the North American market last year, we made over 25 million phone calls”, … “It’s all operator-driven, although we do use a smart dialer”.
My Comment: Right away we get a sense of the business problem. InfoGroup, which as a company prides itself on the quality of its data, feels is it necessary to make 25 million phone calls to validate the data in the U.S. and Canada. This is a far from trivial expense, and one which you can assume they undertake because of the inherent difficulty in building an accurate local search data set.
- Pankaj Mathur:” We get approximately 6,000 phonebooks every year, but we do not necessarily compile each phonebook every year”
My Comment: As you will see later, Pankaj does not believe that having businesses submit their own listings is a great solution, because of the inconsistencies in how they maintain that data. But, chances are, that they keep their advertising up to date. Better still, if they go out of business, they probably cancel the ad.
- (Note: InfoGroup evaluates data using four metics, and this is the first one) Pankaj Mathur: “Completeness – is a measure of the total number of listings and in some sense reflects coverage”
My Comment: An important metric (of course). If you are in Saginaw today, and you are looking for a dry cleaner, you want to know that the search you are performing will return the one closest to you.
- Pankaj Mathur: “Infogroup has around 15 million companies in U.S., the IRS claims about 20 million, and the Chamber of Commerce claims about 24 million
from a Chamber of Commerce perspective, if a license was filed back in 1959, it is considered a valid business in 2010 as long as the owner is still alive and has not filed for bankruptcy” … “Infogroup has defined a business as a brick-and-mortar store having a phone number and location address”, and later: “Google, or any search engine, follows a much broader definition of business or point of interest”
My Comment: This is one of the key problems with local search. What do you define as a business? Here are some related additional points from the interview:
- Pankaj Mathur: “A restaurant may have a drive-in, bar and an ATM on premises”
- Eric Enge: “Say a business only has a P.O. Box for an address; is that something that you would count as a valid business”? Pankaj Mathur: “Yes, we do, such scenario can occur for A financial advisor or a tax consultant working from home”
- Eric Enge: “what about a kiosk-based location like an ordering terminal in a shopping mall or at an airport”? Pankaj Mathur: “When you look at the corporate list, they will tell you that there is a Dunkin’ Donuts or Baskin-Robbins at a particular address, which may actually be a retailer. In this case, what we usually do is make a decision on a case-by-case basis. For the example above, there probably isn’t enough evidence to necessarily route somebody, looking for Dunkin’ Donuts, to a grocery store just because grocery store has a shelf where you could pick donuts of certain brands. … There are cases like an ATM location inside a bank that is still considered a line of business, and we will compile it.”
- Eric Enge: “If someone is working as a plumber out of their house and they use their house as the brick-and-mortar address, are they counted”? Pankaj Mathur: “Yes, that will work”
My Comment: These are just some sample scenarios. We have covered here how InfoGroup handles these scenarios, but each local business search player needs to make their own decisions about these things, and are likely to make different decisions. Is your head hurting yet?
- Pankaj Mathur: (this is the second metric) “Conformance – in some sense this implies standardization or adherence to structure” … “we may come across a Hilton listed under ‘Banquet Halls’ and no mention of it under the ‘Hotels’ category. We have quality control rules and audits in place that helps ensure that all Hilton locations are assigned to ‘Hotels’ as a primary line of business”
My Comment: Another layer of complexity. A given business may consider itself to be relevant to many different categories of business. A restaurant may offer catering services, for example.
- Pankaj Mathur: “it is possible that a Hilton shows up as a golf course … We can call this Hilton and verify objectively if there is a golf course attached to it, and then assign the appropriate categories to the record”
My Comment: Expanding on the prior point, not only is it important to make sure that you have identified the primiary line of business, but you do want to categorize the alternative lines of business as well. Someone looking for a golf course might want to know about the one at the Hilton, for example.
- Pankaj Mathur: (this is the third metric) “Accuracy. This is the probably the easiest of all four to understand, because it is factual, but Accuracy is also the most expensive aspect for data compilation … we use phone validation to ensure reliability of listing information and Accuracy automatically follows from it!
My Comment: The biggest problem with accuracy is the rate at which the data set changes. Businesses close, move, change names, get acquired, or new businesses open. When a business closes, you can pretty much guarantee that they are not contact all the data providers out there to tell them that. Even when something like a brand name changes, it is unlikely that the business will update all the places on the web where the old brand appears. Another thing that can happen include is that the person providing the information simply does it incorrectly.
- Pankaj Mathur: (this is the fourth metric) “Relevancy can be best correlated to intent. So if I am searching for a McDonald’s, the information on John Doe LLC who owns the location is irrelevant (even if it is accurate).
My Comment: The searchers intent is a critical element to the puzzle as well. More on this in the next point.
- Pankaj Mathur: “The intent is different when I am searching in front of a desktop than when I am searching on my smart phone at 10 O’clock at night. Due to this evolution of LBS, there are additional attributes that are coming to the forefront, like opening-closing hours, credit cards accepted, ratings, reviews, and coupons and so on”
My Comment: Mobile search brings a whole new layer of complexity to the problem, because the availability of a whole new level of data becomes critical.
- Eric Enge: “You recently wrote an article about how merchant-submitted listings are not the solution to the local search problem”. Pankaj Mathur: “The intent of the article is to highlight the fact that data coming from corporate chains may not necessarily comply with the four guidelines namely completeness, conformance, accuracy and relevance … If you are a big chain corporation like McDonald’s or KFC, managing data on over 10,000 locations can be quite a daunting task … even if a particular store location open or closed, there is some lag time when these lists get updated”.
My Comment: An example of this was provided above, with the Hilton potentially representing itself as a Banquet Hall. And, as Pankaj suggests here, for large chains, keeping track of all their locations can be prpblematic all by itself. This suggests that a layer of human interpretation may be crucial to this process in the long term.
- Pankaj Mathur: “Usually there is a perception, largely amongst data compilers who do not invest as much in compilation efforts, that merchant submitted listings are “gold” so take it for its face value. My personal opinion is that merchant submitted listings are at best “okay”; there is lot of crap in there that needs to be cleansed to make it valuable”
My Comment: His overall conclusion on merchant listings is clear. Decent source of data, but NOT authoritative. This is consistent with what I have heard from Google.