Transcript of Podcast with John Marshall
Podcast Date: March 8, 2007
The following is a written transcript of the March 8, 2007 podcast between Eric Enge and John Marshall:
Eric Enge: Hi, I'm Eric Enge, the president of Stone Temple Consulting. You can see our website at www.stonetemple.com. We are here today with John Marshall, the CEO of ClickTracks, and we planned to talk about click fraud. You can see the ClickTracks website at www.clicktracks.com. John speaks regularly on the topic of click fraud at industry conferences and ClickTracks is well known for the click fraud detection capabilities of their analytics software. So, let's get started. John, it would be great if you could start us with a definition of the types of click fraud.
John Marshall: Sure. And, I think it's important, because there are two fundamentally different mechanisms that result in click fraud. Most people who are hearing this, will want to know is this something that could be happening to me? You need to understand each of the two fundamental types of click fraud. You need to follow the money. You need to think about where does the money end up, and what purpose does the money serve, and that will help you work out which one, if either is happening.
The first type of click fraud is the case that most people think of; when they think about their own ad budget. And, it's the idea that competitors click on your ads. To an extent, people believe it happens just because competitors want to irritate you. But, they also more importantly want to cost you advertising money so that your ad budget runs out, earlier in the day, when it would do otherwise. And, therefore your ads go dark, because your ads would disappear, once you run out of budget. So, you need to understand that that for your competitors, if it's happening at all, that's their motivation - that they want your ads to go dark. That actually, is much more rare than most people assume. A lot of advertisers think that that happens often, and it does happen to an extent, but it's much more rare than I think people popularly believe.
The second form of click fraud is where ads are clicked in most specifically on content sites; these are partner sites, not necessarily the search engine results page. But, sites that are carrying Adsense ads; and I'm using the Google content targeted ad program just for convenience, click fraud is not limited to Google Adsense of course, all kinds of different ad networks have this ... So for example, let's talk about a site that carries advertising, which is going to be targeted through the text in the site towards buyers of computer hardware. When somebody clicks on those ads, the advertiser pays of course, the ad network, Google or whoever receives a percentage of that revenue, and a percentage of the revenue is shared with the owner of the site; the publisher of the site.
Now, going back to my original point that you want to follow the money; that does create a financial incentive for publishers to get clicks on their sites on those ads. And, not wanting to suggest that therefore all publishers, are illegally, or through other means causing these clicks to happen. But, there is a financial incentive there for clicks to happen; so if it is going to happen anywhere, that's probably the place where click fraud starts.
Eric Enge: I have seen an example in the past where somebody I knew had their site shut off, because they had clicked on a couple of the Adsense ads on their site. And, it was actually ignorance on their part; they were just testing the set up to make sure that it was working. And, it took actually very few clicks, because they didn't have a lot of clicks happening at the time, and they actually lost their right to carry AdSense. So, it's interesting to see that that can happen. But, if I were to draw a conclusion from what you said, if I had a PPC campaign, and I was not enabling my ads to show up on third party sites, then, my exposure to click fraud is potentially quite small.
John Marshall: Yes, that's correct. Or for that matter plugging in a much lower price for AdSense ads. Many people don't want to turn off AdSense, because you do get a certain amount of valuable traffic through that mechanism. But, if you turn off the; if you turn off AdSense or lower the price, you can fairly significantly rein in the risk; at least in the case of Google, you can. In other networks, it's not nearly so easy to do those things; it's more prevalent in other networks, you know. There is definitely some careful thinking that you have to do, but in the case of Google that can in fact be a good strategy to limit the exposure that you have.
Eric Enge: And, your sense is that, it's not as easy to do in Yahoo!, for example?
John Marshall: No, I think Yahoo is okay. Maybe segueing to a topic that is related; you do get significantly different quality of clicks and prevalence of problems in networks other than Google and Yahoo. Google and Yahoo are not free of the problem, but they certainly do a better job of reining it in, using some of the mechanisms that you've just described. Your friend's site was blocked pretty quickly.
There is still this fairly common driver of click fraud, which is that a site can be built that contains a whole bunch of templated content; it can immediately carry AdSense ads, and it can perpetuate a huge amount of click fraud on certain topics. You are going to picture here that a site doing this intends to be targeted around the specific sets of search keywords of a certain topic, and a certain taxonomy of terms. And, that site can be built very quickly, because the software has been written to build these templated sites. And, that software often was originally developed for search engine optimization purposes to build highly optimized sites very quickly. But, the same code can be used to construct a site that carries AdSense ads, gets a whole bunch of traffic, generates a whole bunch of clicks through them, and then disappears again within a few days. So, the idea that this can be effectively blocked by the advertising network alone is difficult to conceive, because there is just so many moving parts in the whole thing.
Eric Enge: Yes. And, there isn't enough time to get real data take real action. So, what percentages of clicks would you estimate that click fraud represents in terms of paid clicks?
John Marshall: It's very difficult for me to answer the question, because it varies so much by demographic. It's just really complicated. In some industries, it's going to be close to zero, and then other industries it's a lot. For example in the mortgage and refinancing business it's going to be a lot. Because, those types of ads use AdSense fairly extensively; because they need the clicks. For a site selling industrial machine tools, it's just going to be a lot less. Because, there isn't so much competition there for the words. More competitive key words tend to be keywords that are more expensive. Those are more likely to have higher rates of click fraud. So, you could end up with twenty percent of clicks being fraudulent or even higher; on the other side it could be as low as one percent.
Eric Enge: Is there is a certain amount of randomness to the distributions of who gets hit? For example, within an industry, maybe one site is getting picked on for some period of time for some reason, but another one isn't. Is there some of that, too?
John Marshall: We maybe exploring the territory of what can you use to detect click fraud, because what you've asked there is actually background to that question. There are some things that you can do about detecting click fraud, that you think would work. But, in fact they don't work very well, and then it turns out, the things that you can do to detect click fraud are quite simple, and it's sort of easy for software to do it, that it gets a little more difficult for humans to do by hand.
One of the things that you think would happen with click fraud is lots of requests from the same IP address, or lots of requests that have a robot as a user agent. But, in the world of analyzing a website, looking at these bits of data are fairly common practices, or you could look at the country of origin of the click, or the fact that the requesting entity doesn't set any cookies, or doesn't accept cookies. So, all these things seem like good ideas but in fact they don't work very well. At least they don't work well any more for picking up click fraud. It's just the whole thing has become more sophisticated then that would allow you to detect.
The whole thing is an arms race, as detection methods get better. At the same time, the ability of the perpetuators of the fraud to fly beneath the radar gets better. It's certainly becoming a more complex problem. There are some traits that you can look for, however, from the point of view of being a piece of software looking at the data coming in. Now, some of these things can be done by humans, but they are quite difficult to do, and it's something that automated software can do much more easily. One thing that you should always look for is, the way that campaigns change over time.
That's really a bit of a nebulous thing to say, but I think some of this can't be more specific. Exactly what to look for is quite difficult to define. It depends on the type of campaigns you have, and it depends on the typical behavior of people on your site. But, if you suddenly see lots of clicks on a particular campaign, rather then across all your campaigns, and, that campaign is not your something that could have suddenly gotten lots of press attention in the press, that can be a good indicator that click fraud is happening.
Now going back to your original question; click fraud tends to come and go on a relatively small number of campaigns. It isn't this sort of background hum across all your campaigns; it tends to spike on a particular campaign and then go away. The reason that happens is the that it's very difficult for the people committing the fraud to know the entire universe of key words that you are buying, and also where those keywords are running, and on what ad networks those keywords are actually running. They tend to choose a particular keyword, and it isn't necessarily targeted directly at your business. But, it just happened to be one that you are buying, and that's the one that they chose to target for activity.
Perhaps, they built a site optimized around showing that particular type of content. They then tend to make the activity come and go very quickly over a period of just a few days. Because, that makes it harder to detect; and it also makes it less likely that the advertiser will make a fuss about it. If it was just a constant barrage, it would be far easier to detect. When it's just one ad, it may represent only twenty percent of your ad budget for a given day. A lot of people wouldn't notice. So, you actually exploit that, the fact that the click fraud tends to come and go, and you can exploit that in the algorithms which we use to detect this process.
Eric Enge: Right. So, if you and I, both had our own mortgage sites, it might well be that Eric's mortgage site gets hammered at one particular moment, and John's does not.
John Marshall: That's right.
Eric Enge: And, then within that, I might have had one campaign and maybe even one keyword in particular that got hammered.
John Marshall: That's right.
Eric Enge: As you said it's the mechanics of how they pick keyword targets, and set up what they are trying to do on their end.
John Marshall: Yes, I think that's right.
Eric Enge: Okay. There is also a different kind of problem, which I refer to as invalid clicks. One example I have of that is with a company with some developers over in India, and I don't mean to pick on India, it can be any overseas country, and, you are running your ad campaign, and you mean it only for US users, but these overseas users come floating in through a US network, because they are part of a larger company. Maybe they work for IBM or something, and they are clicking on your ad.
Now, I'm distinguishing that, because I don't think that this scenario meets either of the two criteria you spoke of before; they are not trying to make revenue for their site there, not trying to drive up your cost. They might actually, genuinely be researching something. But you really didn't want that click, because they are not going to become a customer. Is that a scenario that you see as well?
John Marshall: Yes, it is. There are many situations where that happens. To expand upon your example is, let's say you have an ad running for fresh fruits. You might have set up the ad with Geo targeting for US users only. A lot of people make mistakes with how they set the campaign up; they don't necessarily limit the Geo targeting to just the US. They may well get clicks from people all over. And, of course, nobody from India is going to buy fresh fruit from a US company. It's just going to be too expensive to ship, and the locally grown stuff is better anyway.
So, any click that you get from a customer in India is, in that scenario is going to be invalid. But, it's not invalid from the Search Engine's point of view. From the Search Engine's point of view, everything is correct; it's invalid because, the customer is never going to do business with you. That particular scenario would tend to not show up as click fraud in the detection mechanisms which ClickTracks uses. Because, that would tend to be evenly spread among all your keywords, and because it goes cross all your keywords, you would have approximately the same number of people from India as the US. Our algorithm would not trigger that as being possible click fraud. That's probably the right thing to do. On the other hand, you can use your web analytics tool to tell you that you've got a lot of clicks coming from India. Your web analytics tool could do the Geo IP look up, and you get that data and you go fix it.
A more common scenario than that is, people do apply the Geo targeting in general, across their ads, but they forget a couple of them. Then there maybe one or two ads that accidentally do show to visitors who are based in India. Our detection algorithm would highlight that. That's a very interesting example of where too much hype over click fraud has created too much scare mongering really. Because, we have to be pretty careful in our software about not attempting to panic people. We will highlight that particular campaign, saying this campaign has got clicks from India, and none of the other campaigns do. That's the important thing; it's not that you've got a campaign that's got clicks from India, that doesn't necessarily matter. The important thing is that lots of the other ones don't. So, now we would highlight that campaign and say, this was going on with this campaign, and this is why we are highlighting it. But, the advertiser at that point has got to investigate it, and figure out what's going on. The advertiser needs to make a judgment call. Is this a problem, or is this is this just a badly targeted ad?
That is in the realm of the somewhat complex area of false positives. And, if you want to know more about false positives, I will explain to you why false positives are a good thing.
Eric Enge: Okay.
John Marshall: I will do an analogy for you, and this example works very well in a comparison of click fraud. When you go through the Airport Metal Detector, you will see the Airport Metal Detector is a very simple system. It has a threshold when a certain amount of ferrous metal is detected, it goes beep and less then that amount, it doesn't go beep. And, many people want click fraud detection to work like that. They want something that just simply goes beep when there is a problem, and otherwise it doesn't go beep. However, anybody who's been to the airport will have experienced what is known as a false positive, which is a metal detector is really attempting to discover weapons that have been taken on.
But, many other things that go through contain enough ferrous metal to trigger it. And, at that point a human being has to get involved, and has to investigate the situation for what's really going on. Now, an obvious solution to avoid having a human having to get involved is to raise the threshold for the amounts of ferrous metal. You just simply don't make it go beep on any sets of keys, and you make it so that that goes beep if more ferrous metal is detected. At that point, you'll have far less false positives, and much less inconvenience and everybody will be happy. The problem with that is that if you tune a detection system to having very few false positives, you will get inherently a larger number of false negatives. False negatives, meaning a weapon was in somebody's pocket and the metal detector didn't go beep.
False negatives are much more expensive to society then false positives. Hence, you always want your detection system tuned to have no false negatives, and that means you will have a certain number of false positives. And, in the end, that's what the detection algorithms in ClickTracks do, that are a little bit unique. We simply say, this ad over here has got problems; and it's got problems with lots of invalid clicks.
Eric Enge: Then you have to use the tools you provide to find out whether or not it's just an invalid click as we talked about before, or an actual case of click fraud.
John Marshall: That's right. I am of the opinion that software can not do a good enough job of going through that full process for you. I know that some people disagree with that, but, I am very skeptical that automated software can completely take care of it, because I think if that type of problem could be completely taken care of by software that problem would have been solved at airport metal detectors by now, and that hasn't happened. It surely is not going to happen to something as trivial as click fraud.
Eric Enge: What do the search engines do about click fraud?
John Marshall: The search engines, actually I think are doing a very good job; the tier I engines are. I think it is basically the size of the company, when it gets smaller in my experience they do a less good job. But, Google and Yahoo, they do a pretty good job. They are looking at the data statistically across all the clicks that are happening. They have a very large scale view of the activity on the network. They can use very effective statistical sampling techniques to examine activities. They don't need to look at every click; you only need to sample the data because they have so much data to get a really good picture. I like the way the search engines handle this. I think that it is not possible to, and not realistic to, rely on the search engines to detect every possible type of fraud.
It's incumbent upon the advertiser to take some level of responsibility to know what's going on with the clicks on their ads, and to use some kind of a web analytics tool, preferably something with some sort of a click fraud detection built in. For example, the algorithms that I've just described to you with regard to how ClickTracks works would have highlighted this problem that we discusses where you got invalid clicks from India, because you didn't setup your Geo targeting properly on the ad. Now, that is the situation of invalid clicks. I can tell you that the search engines themselves would not detect that as click fraud, and they would not rein in that particular ad.
I think the engines do a pretty good job of catching large scale endemic fraud, and preventing people getting charged for that, but there is enough other stuff, such as invalid clicks or small scale fraud that people are subject to, that if they are not actively looking for those kind of problems, and reining them in, they are wasting money.
Eric Enge: Right. So, you refer to small scale fraud, which may small scale to Google, but it isn't by any means small scale to you, as the advertiser.
John Marshall: That's right, yes. I think if ten percent of what an advertiser spends is going to fraud, most advertisers would get pretty uptight about that. I don't think anybody would say, oh ten percent, yeah sure, I can afford to waste that, not a big deal.
Eric Enge: I would assume though that the search engines are going to be more likely to tune their algorithm to get false negatives, because from a practical manner that's kind of what they need to do, does that make sense?
John Marshall: Yes.
Eric Enge: That's a little bit of what you are looking for with your own analytics package. So, you do think that it's correct, and normal that the webmaster needs to do some things on their own, to watch after their own PPC programs.
John Marshall: Yes, I do. I think it's unrealistic to expect the engine to do everything for them. Because, the same algorithms that detects click fraud, at least in the case of Clicktracks, also just by their design detect really under performing ads. And, one of the problems that you have got with measuring the performance of ads; is that most people that are sort of new to web analytics would say well, I'm looking at my ROI, and it's just obvious whether the ad is performing or not.
The problem with that is that, when you look at the web analytics data, you've often have campaigns that get such small amounts of traffic, that they effectively have zero ROI. There are no sales that can be attributed back to that campaign; and yet the campaign gets clicks. And, given the lag time that often exists between the click and the final purchase, a couple of weeks, couple of months; you can actually have a campaign that is running, and that suddenly surges with a whole lot of traffic. But, a whole bunch of fraud comes in, and then goes away two or three days later. And, it never would show up in the ROI calculations, because the conversions under normal circumstances wouldn't happen for at least couple of weeks anyway. So, you can't just put all of this stuff on autopilot, and rely on these sort of very basic ROI types of things to do the job for you. Because, if fraud happens just by its very nature, you are not going to be able to detect it using trivial methods. You've got to get in there, and the advertiser has a high degree of responsibility to know where there dollars are going.
Eric Enge: Once you've found that you have click fraud, how should a site owner communicate with the search engines about it?
John Marshall: I'm going to somewhat defend search engines here. I think that there is so much discussion around click fraud, and they receive so many emails about fraudulent activity, that if you just send an email to the search engine saying I've got fraud, I can prove it; give me my money back. You are just not going to get a response, because I'm going to guess, they get thousands of those emails everyday. I would say in the majority of cases, perhaps sixty percent of the time, it's not fraud. For example, it's an ad that was badly designed, or badly targeted; the wrong keywords were chosen, there are all kinds of reasons why this ad was never going to produce results. From the engine's point of view they are getting bombarded with what I think are often bogus requests. So, if you want to be one of the people that they pay attention to, you need to have data in your hands that supports your point of view. The data that you need to provide is, at a minimum, what my campaign used to do, but over this time period here is what it did. I didn't change anything about this campaign, and if possible you want to actually provide these forensic data. Note that the click fraud detection suite in Clicktracks includes the ability to output the forensics. And, what that boils down to, is every request as originally seen with the IP addresses, and the dates under referral in each session, and broken out by the pages viewed, and all that stuff. So, that data then comes in the form of a huge great long list, which is a big excel document and you send that over to the engine.
And, it's got the IP addresses of the requesting entity; it's got all the strings that the browser is sending back and forth. So, that data in our experience is very helpful. We certainly have been able to get a credit or refund many times using that. If you can't get that, at a minimum you need times and dates and campaign IDs and proof that this campaign used to be okay, and it was not okay here. And, basically argue your case upfront in an email rather then just saying in an email I've got fraud, please call me.
Eric Enge: Right. So, you need to provide them details, because it's the only way they going to have a chance of coping with it.
John Marshall: Yes.
Eric Enge: Alright, great. Well, thanks for taking the time to speak with us today. I hope the audience enjoyed it, I know I did. And, we will look forward to talking to you soon.
John Marshall: Eric, thank you. It has been great fun.
About the Author
Eric Enge is the Founder and President of Stone Temple Consulting (STC). STC offers Internet marketing optimization services, including SEO, Social Media and PPC optimization, and its web site can be found at: http://www.stonetemple.com.