This may seem like a very specific question, but understanding how Google discovers new content is just the type of stuff that the geeks of the IMEC board love to do!
TL:DR: For those of you who don’t want to read the whole story, the short answer is NO, there was no evidence that Google monitors the clipboards of phones, either iPhone or Android, to discover URLs. However, some other bots did visit the pages. Find out which ones below!Test shows no evidence Google sniffs smartphone clipboards to discover new URLs.Click To Tweet
More on What We Specifically Tested
This all started due to an observation made on Twitter by Dan Petrovic that the Facebook iPhone app can now pop up a notice that you recently copied a link to your phone’s clipboard, and asks if you’d like to share it.
— DEJAN (@dejanseo) February 25, 2016
Since the link in the clipboard had not been pasted into anything yet, it became clear that the Facebook app was monitoring the clipboard. This led to Mark Traphagen observing that this is “normal” as any iPhone or Android app must have access to the clipboard in order for inter-app copying and pasting to work.
Next, Dan wondered whether Google might discover links via this method. Hence, this test was born!
Here is how we did it:
- We created two pieces of content. The two pieces of content contained NO Google code in them. No Google Analytics, no Tag manager, no Google Plus code, and no Google code of any other type whatsoever.
- These two pieces of content were uploaded onto stonetemple.com via direct FTP. WordPress is the publishing platform used on stonetemple.com, but it was NOT used to upload the content. The uploads took place on March 7, 2016.
- No links of any kind were implemented to the content.
- The content has never been loaded in any browser of any kind. No Chrome, no Google app, no Firefox, no Internet Explorer, no Safari, NADA.
- A select group of seven people had the URLs for the two pages sent to them via SMS text message. We did it this way so that there would no opportunity for the URLs to be sniffed by Google out of an email.
- Test participants copied the URLs one at a time into their Smartphone clipboard. Neither URL was pasted into anything. Then the participants copied an arbitrary third URL into their clipboard to effectively delete the second URL from there.
- All of the above was executed between March 7, 2016 and March 9, 2016.
Beginning with March 7, 2016, we started monitoring the log files of stonetemple.com on a daily basis, to see if Googlebot paid a visit to either of the two pages. As a final safety check, we also took quoted text strings from both test pages to make sure they didn’t end up in the Google index by some means (they didn’t).
As detailed above, Googlebot never came to the pages, but several other user agents did. To understand what unfolded, let’s start with the different devices used to copy the URLs into the clipboard:
As you can see, we have four iPhone 6s phones, and three different types of Android phones. Now let’s take a look at what user agents paid the pages a visit:
Perhaps the most interesting visitor of all is the first one. One of the two pages was visited by a Facebook user agent. The details on that user agent are provided here. Here is what that page says about it:
Facebook allows its users to send links to interesting web content to other Facebook users. Part of how this works on the Facebook system involves the temporary display of certain images or details related to the web content, such as the title of the webpage or the embed tag of a video. Our system retrieves this information only after a user provides us with a link. You may have found this page because a Facebook user sent a link from your website to other Facebook users. If you have any questions or concerns about any links or content sent by one of our users, please contact us at email@example.com.
Note that each phone copied two URLs to the clipboard, not just one, so it’s interesting that Facebook chose to visit only one of the two web pages.Test: Googlebot does NOT visit URLs copied on a smartphone...but Facebook does!Click To Tweet
One of the user agent strings stands out as a real anomaly:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36
None of our seven testers used a Macintosh computer to access the URLs sent by SMS, as they were sent only to the testers’ smartphone devices. So, to be honest, I have no explanation for this one. This could be considered an impurity in the test, but since Googlebot never came to visit either URL, it doesn’t invalidate the results (if Googlebot did come to visit, we couldn’t be sure about the validity of the results).
The other user agents are easy to understand and explain, as laid out in the user agent graphic above. Note that the behavior across iPhones was quite consistent, where the Android phone behaved quite differently. This is consistent with the Android OS being open source. The Chrome browser on the LG-H811 phone was used to sniff the target URLs, but this didn’t happen on the other two Android devices. Both of these used the Apache-HttpClient user agent to access the URLs.
I found a detailed description of what this is on StackExchange:
The user agent belongs to Apache HTTPComponents, which is a Java library that handles HTTP requests. For example: It could be an Android app that is using the library to send POST requests to your login script. The UNAVAILABLE part is typically where the version number is located. As far as I know, this user agent is used as the default user agent for requests (i.e. the developer failed to set a custom user agent while setting up their client).
Credit for this detailed answer goes to wexford.
To repeat the bottom line, Googlebot never came, and that suggests that Google isn’t aggressively sniffing the clipboards of smartphone devices to detect new URLs to crawl. Of course, we tested with seven devices, not thousands, but I still think this is a valid test.
We did see Facebook visit one of the URLs, and that was pretty interesting. However, we don’t know anything about why that was happening. It also appears that many of the phones themselves pinged the URLs.
But let’s step back and consider why Google doesn’t appear to use this method for URL discovery. Here’s what I think:
- The web is so vast now that Google can’t possibly crawl all of it, and they prioritize what they crawl to discover what is the most important content. In other words, they prioritize the items they crawl based on other signals, such as links.
- URLs copied into a clipboard on a Smartphone don’t represent much of a signal at all to Google. If they don’t find it by other means, such as links, they’re not going to crawl it anyway. So given limited resources, it makes no sense to invest in a method of URL discovery that is based on what will be treated as a very low significance signal.
If you are interested in more content about how Google discovers URLs, you may also be interested in the test we did on whether Google sniffs Gmail for URLs.
Thanks, as always, to the IMEC board (Rand Fishkin, Mark Traphagen, Annie Cushing, Dan Petrovic, David Minchala, and the entire group of IMEC participants! And, for completeness, here is my Twitter Handle.
In many of our projects, we make use of volunteer help to take on tasks for us as part of our testing. If you’re interested in participating, you can apply to join the IMEC Labs Volunteer Group here.