In this study we explored the way that Google indexes tweets. The reason we embarked on this was to determine the likelihood that Google might use signals from Twitter for ranking purposes, but we found lots of other interesting information in the process. Spoiler alert:
- Google does not index a particularly significant percentage of tweets at all [Tweet This!]
- The tweets it indexes are highly biased to people who have 1 million followers or more. [Tweet This!]
- Even for those high authority accounts, indexing is not particularly fast! [Tweet This!]
Read on for the details of what we found!
Basic Twitter Indexing Information
In Twitter’s IPO filing, it was reported that Twitter is handling more than 500 million tweets per day on average. The following image shows a pair of Google search queries we used to attempt to find out how many Twitter pages Google has in its index:
Between these two queries, we see less than 1.5 billion pages, which is a pretty small number when you consider that there are 500 million tweets per day – it’s less than three days worth. However, the data from these two queries is not necessarily that accurate, so we decided it was worth trying to break this down further, and see how many Twitter pages Google was indexing per month.
To accomplish that, all you need to do is utilize the advanced search query operators as shown in this graphic.
First click on “Search tools,” then “Any time,” and then “Custom range.” Then you can use the calendar feature to pick a range of dates. We did this on a month-by-month basis for each month January 2012 through June 2014. Here are the results we got:
First, the disclaimer. The site: query is known to be rather imprecise. However, even allowing for a large degree of error, this data suggests that the indexing rate of tweets is actually quite low. This already makes a pretty strong statement of the value of the information in the average tweet to Google (i.e. it’s fairly close to zero value).
As a further note, you may be wondering how many tweets are actually retweets, which could potentially make them a lot less valuable to separately index. That’s debatable, of course, as a retweet of your tweet would be an indicator of greater value and essentially behaves like the link graph in the world of Twitter (we can call it the “Retweet Graph”). But, in any event, according to Dan Zarrella’s analysis of 5 million tweets, retweets make up about 1.4 percent of the total number of tweets.
Detailed Research on Indexing of Tweets
Back in December, we published a study on the potential impact of Facebook on SEO, and in it we studied how Google indexes content from highly prominent Facebook profiles. It showed the updates from influential Facebook profiles only getting indexed at around 59 percent. Today we are reporting on a similar study for Twitter.
In this part of the study, we included an analysis of the indexation of posts for 963 different Twitter accounts. We used the Twitter and Google APIs to pull the last 20 tweets from each of these accounts and tracked their indexation levels in a number of different ways. The follower numbers of the accounts included in the study were broken into these categories:
- More than 5M followers – 26 accounts
- 3M to 5M followers – 9 accounts
- 1M to 5M followers – 23 accounts
- 500K to 1M followers – 20 accounts
- 100K to 500K followers – 71 accounts
- 10K to 100K followers – 199 accounts
Aggregate Indexing of Tweets Over Time
The first look we took at the data was to see what percentage of tweets were being indexed, without regard to the number of followers. Here is what we saw over the first seven days:
In aggregate, there were 10,453 tweets that we saw within the last seven days, and 326 of them were indexed, for an indexation level of 3.12 percent. This is actually pretty consistent with what we saw in the first part of our study where we used simple site: queries to check indexation levels over time.
We also looked at the indexation levels for tweets that were more than one week old:
Once again, you see that the indexation level is relatively low. There were 19,389 total tweets checked, with 701 of them being indexed, for an indexation level of 3.62 percent. Total indexation in our data peaked at about week four. Given the depth of our data, I’d conclude that indexation of tweets increases over time and peaks between two and four weeks, and then it starts to decline after that. It may not be as low as the 0.1 percent levels we saw with the site: query tests we did, but at best it’s a small percentage of total tweets.
Breakout of Indexation Levels by Follower Count
We also broke out the data by follower count. The results were very interesting, as shown here:
As you can see, the indexation level of tweets for people with 1 million or more followers is actually quite high. As soon as you drop below 1 million followers though, it plummets. This decline continues, and for accounts under 10,000 followers, the indexation rate is only 0.22 percent, a level that is pretty consistent with the data we found with our site: queries.
Given the nature of how we identified our Twitter accounts, it’s clear that we were heavily biased toward larger accounts. In our test, 63.9 percent of what we tested had 10,000 followers or more, and those are very lofty numbers. The overwhelming majority of accounts have far less than 10,000 followers, and that also suggests that our site: test data is not that far off.
Indexation of Major Influencer Tweets Over Time
We looked at the indexing of tweets from very influential accounts over time in a more detailed manner. When we ran our test program, we noted when we researched the tweet, and the time of the tweet. This allowed us to see indexing levels of the tweets over time on a day-by-day basis. Here is what we saw:
What is really interesting about this is that the tweets from these very high profiles are not indexed particularly quickly. It has long been believed that Twitter is used by Google for news discovery, but this data suggests that Google is not particularly fast at indexing tweets even from the most influential profiles.
What Causes Tweets to Get Indexed?
We broke the tweets down into a variety of categories to see how that might impact indexation. For purposes of this analysis, we concentrated on the five Twitter profiles with the largest number of followers, and the five Twitter profiles that had the most inbound links. We broke it down this way so we could see if high follower count had more of an impact on a profile’s indexation rate than the profile having a large number of inbound links. Please note that the sample size for this test was small; a total of 92 tweets were checked at this level of detail.
For the five profiles with the highest follower counts, we found that 80 percent were indexed, and for the five profiles with the strongest link profiles, we found that only 20 percent were indexed.
We then went a little further to examine all the indexed tweets to see what types of tweets they were. For example, for the five profiles with the most followers, we found that 20.3 percent of the indexed tweets were newsy or very topical in nature, and 43.2 percent of the indexed tweets had a link in them. Inbound links to the tweet seemed to enhance the probability of being indexed as well, as 71.6 percent of the indexed tweets had inbound links to them.
We also looked to see what percentage of the indexed tweets were news oriented OR had a link in them OR had a link pointing to them, and that aggregated total was 86.5 percent. Here is the chart showing that data in a bit more detail:
Note that in this data that an indexed tweet may have a link in it, an image, AND have links to it – there can be some real overlap. We repeated the investigation by examining the makeup of all the non-indexed tweets. Among these, we saw that only 16.7 percent of these were news oriented, had a link out, or an inbound link pointing to them among the five profiles with the highest follower count, and that number dropped to 13 percent among the five profiles that had the most inbound links.
We took one last slice at this data, which was to focus on it by category. In other words, among the five profiles with the highest follower counts, and the five profiles with the highest inbound link counts, what percentage of news oriented tweets were indexed? Interestingly enough, it was 100 percent. It actually looked like 100 percent of image tweets were indexed as well, but the sample size for that was exceedingly small.
I need to emphasize that the data in this section is based on a small total sampling of only about 152 tweets from very high profile accounts. As a result, I would not try to draw any deep conclusions from it and offer it purely to fuel speculation, and perhaps to provide grist for a future, more in-depth study on the topic.
Twitter Links are NoFollowed
We also looked at the source code for a tweet. As with Facebook, this link is NoFollowed, so no PageRank is passed by the link:
This is common on social media networks, largely because the content is user generated, and this makes the value of that “endorsement” suspect. Google’s John Mueller had this to say about these types of links:
I think it’s always a bit tricky for us when we can recognize that it’s a user generated content site and we’re not really sure how to trust those links within there.
In summary, these charts show us that overall indexing of content of tweets by Google is quite low [Tweet This!], but they do in fact index a reasonably high percentage of tweets from more influential accounts (up to 50 percent) [Tweet This!]. However, the indexation is not as rapid as we would have expected [Tweet This!]. Given the conventional belief that Google might use shared links in Twitter as a potential indicator of a hot news event, we would have expected that indexation rate would be more rapid.
However, the data does not necessarily support this. Even for accounts with more than 5 million followers, only six percent of tweets are indexed within the first 24 hours, and this only climbs to 15 percent by the end of 48 hours. This is nothing to write home about! However, Google could actually be crawling the tweets, not indexing them, but still using URLs it finds as a means of discovery. We just don’t know.
In summary, to me the evidence suggests that Google does not currently use activity in Twitter as a ranking signal. If I am wrong about that, and they do extract some ranking signals from Twitter, then the evidence suggests that they are doing that primarily from accounts with 1 million or more followers – i.e., the absolute cream of the crop.