How Does Google Index Tweets?

In this study we explored the way that Google indexes tweets. The reason we embarked on this was to determine the likelihood that Google might use signals from Twitter for ranking purposes, but we found lots of other interesting information in the process. Spoiler alert:

  1. Google does not index a particularly significant percentage of tweets at all [Tweet This!]
  2. The tweets it indexes are highly biased to people who have 1 million followers or more. [Tweet This!]
  3. Even for those high authority accounts, indexing is not particularly fast! [Tweet This!]

Read on for the details of what we found!

Basic Twitter Indexing Information

In Twitter’s IPO filing, it was reported that Twitter is handling more than 500 million tweets per day on average. The following image shows a pair of Google search queries we used to attempt to find out how many Twitter pages Google has in its index:

How Many Tweets Does Google Have Indexed?

Between these two queries, we see less than 1.5 billion pages, which is a pretty small number when you consider that there are 500 million tweets per day – it’s less than three days worth. However, the data from these two queries is not necessarily that accurate, so we decided it was worth trying to break this down further, and see how many Twitter pages Google was indexing per month.

To accomplish that, all you need to do is utilize the advanced search query operators as shown in this graphic.

How to Query a Date Range

First click on “Search tools,” then “Any time,” and then “Custom range.” Then you can use the calendar feature to pick a range of dates. We did this on a month-by-month basis for each month January 2012 through June 2014. Here are the results we got:

Twitter Indexation By Month

First, the disclaimer. The site: query is known to be rather imprecise. However, even allowing for a large degree of error, this data suggests that the indexing rate of tweets is actually quite low. This already makes a pretty strong statement of the value of the information in the average tweet to Google (i.e. it’s fairly close to zero value).

As a further note, you may be wondering how many tweets are actually retweets, which could potentially make them a lot less valuable to separately index. That’s debatable, of course, as a retweet of your tweet would be an indicator of greater value and essentially behaves like the link graph in the world of Twitter (we can call it the “Retweet Graph”). But, in any event, according to Dan Zarrella’s analysis of 5 million tweets, retweets make up about 1.4 percent of the total number of tweets.

Detailed Research on Indexing of Tweets

Back in December, we published a study on the potential impact of Facebook on SEO, and in it we studied how Google indexes content from highly prominent Facebook profiles. It showed the updates from influential Facebook profiles only getting indexed at around 59 percent. Today we are reporting on a similar study for Twitter.

In this part of the study, we included an analysis of the indexation of posts for 963 different Twitter accounts. We used the Twitter and Google APIs to pull the last 20 tweets from each of these accounts and tracked their indexation levels in a number of different ways. The follower numbers of the accounts included in the study were broken into these categories:

  • More than 5M followers – 26 accounts
  • 3M to 5M followers – 9 accounts
  • 1M to 5M followers – 23 accounts
  • 500K to 1M followers – 20 accounts
  • 100K to 500K followers – 71 accounts
  • 10K to 100K followers – 199 accounts

Aggregate Indexing of Tweets Over Time

The first look we took at the data was to see what percentage of tweets were being indexed, without regard to the number of followers. Here is what we saw over the first seven days:

Tweet Indexation by Day

In aggregate, there were 10,453 tweets that we saw within the last seven days, and 326 of them were indexed, for an indexation level of 3.12 percent. This is actually pretty consistent with what we saw in the first part of our study where we used simple site: queries to check indexation levels over time.

We also looked at the indexation levels for tweets that were more than one week old:

Tweet Indexation by Week

Once again, you see that the indexation level is relatively low. There were 19,389 total tweets checked, with 701 of them being indexed, for an indexation level of 3.62 percent. Total indexation in our data peaked at about week four. Given the depth of our data, I’d conclude that indexation of tweets increases over time and peaks between two and four weeks, and then it starts to decline after that. It may not be as low as the 0.1 percent levels we saw with the site: query tests we did, but at best it’s a small percentage of total tweets.

Breakout of Indexation Levels by Follower Count

We also broke out the data by follower count. The results were very interesting, as shown here:

Tweet Indexation by Follower Count

As you can see, the indexation level of tweets for people with 1 million or more followers is actually quite high. As soon as you drop below 1 million followers though, it plummets. This decline continues, and for accounts under 10,000 followers, the indexation rate is only 0.22 percent, a level that is pretty consistent with the data we found with our site: queries.

Given the nature of how we identified our Twitter accounts, it’s clear that we were heavily biased toward larger accounts. In our test, 63.9 percent of what we tested had 10,000 followers or more, and those are very lofty numbers. The overwhelming majority of accounts have far less than 10,000 followers, and that also suggests that our site: test data is not that far off.

Indexation of Major Influencer Tweets Over Time

We looked at the indexing of tweets from very influential accounts over time in a more detailed manner. When we ran our test program, we noted when we researched the tweet, and the time of the tweet. This allowed us to see indexing levels of the tweets over time on a day-by-day basis. Here is what we saw:

Tweet Indexation Over Time for High Follower Profiles

What is really interesting about this is that the tweets from these very high profiles are not indexed particularly quickly. It has long been believed that Twitter is used by Google for news discovery, but this data suggests that Google is not particularly fast at indexing tweets even from the most influential profiles.

What Causes Tweets to Get Indexed?

We broke the tweets down into a variety of categories to see how that might impact indexation. For purposes of this analysis, we concentrated on the five Twitter profiles with the largest number of followers, and the five Twitter profiles that had the most inbound links. We broke it down this way so we could see if high follower count had more of an impact on a profile’s indexation rate than the profile having a large number of inbound links. Please note that the sample size for this test was small; a total of 92 tweets were checked at this level of detail.

For the five profiles with the highest follower counts, we found that 80 percent were indexed, and for the five profiles with the strongest link profiles, we found that only 20 percent were indexed.

We then went a little further to examine all the indexed tweets to see what types of tweets they were. For example, for the five profiles with the most followers, we found that 20.3 percent of the indexed tweets were newsy or very topical in nature, and 43.2 percent of the indexed tweets had a link in them. Inbound links to the tweet seemed to enhance the probability of being indexed as well, as 71.6 percent of the indexed tweets had inbound links to them.

We also looked to see what percentage of the indexed tweets were news oriented OR had a link in them OR had a link pointing to them, and that aggregated total was 86.5 percent. Here is the chart showing that data in a bit more detail:

Indexed Tweets by Category

Note that in this data that an indexed tweet may have a link in it, an image, AND have links to it – there can be some real overlap. We repeated the investigation by examining the makeup of all the non-indexed tweets. Among these, we saw that only 16.7 percent of these were news oriented, had a link out, or an inbound link pointing to them among the five profiles with the highest follower count, and that number dropped to 13 percent among the five profiles that had the most inbound links.

Non-Indexed Tweets By Category

We took one last slice at this data, which was to focus on it by category. In other words, among the five profiles with the highest follower counts, and the five profiles with the highest inbound link counts, what percentage of news oriented tweets were indexed? Interestingly enough, it was 100 percent. It actually looked like 100 percent of image tweets were indexed as well, but the sample size for that was exceedingly small.

By Category, Percent Tweets Indexed

I need to emphasize that the data in this section is based on a small total sampling of only about 152 tweets from very high profile accounts. As a result, I would not try to draw any deep conclusions from it and offer it purely to fuel speculation, and perhaps to provide grist for a future, more in-depth study on the topic.

Twitter Links are NoFollowed

We also looked at the source code for a tweet. As with Facebook, this link is NoFollowed, so no PageRank is passed by the link:

Twitter Links are NoFollowed

This is common on social media networks, largely because the content is user generated, and this makes the value of that “endorsement” suspect. Google’s John Mueller had this to say about these types of links:

I think it’s always a bit tricky for us when we can recognize that it’s a user generated content site and we’re not really sure how to trust those links within there.

Conclusions

In summary, these charts show us that overall indexing of content of tweets by Google is quite low [Tweet This!], but they do in fact index a reasonably high percentage of tweets from more influential accounts (up to 50 percent) [Tweet This!]. However, the indexation is not as rapid as we would have expected [Tweet This!]. Given the conventional belief that Google might use shared links in Twitter as a potential indicator of a hot news event, we would have expected that indexation rate would be more rapid.

However, the data does not necessarily support this. Even for accounts with more than 5 million followers, only six percent of tweets are indexed within the first 24 hours, and this only climbs to 15 percent by the end of 48 hours. This is nothing to write home about! However, Google could actually be crawling the tweets, not indexing them, but still using URLs it finds as a means of discovery. We just don’t know.

In summary, to me the evidence suggests that Google does not currently use activity in Twitter as a ranking signal. If I am wrong about that, and they do extract some ranking signals from Twitter, then the evidence suggests that they are doing that primarily from accounts with 1 million or more followers – i.e., the absolute cream of the crop.

Comments

  1. says

    Eric, you say in the end “they do in fact index a reasonably high percentage of tweets from more influential accounts (up to 50 percent)”

    Now, does that mean “influential” or does that mean “most followed.”

    i.e. Could I go buy 5 million followers to get more of my tweets indexed? (not suggesting that as an actual option I am thinking about – more interested in the hypothetical)

    • says

      Then again, I’m sure the accounts you monitored with high follower counts were all upstanding and influential accounts. I’d be curious to see if the same happens when they are clearly spam accounts!

    • Eric Enge says

      Andrew – it does mean most followed, which we are using here as a proxy for influential. Hopefully, this post does not inspire people to go buy followers! As you suggest, I am sure that the way Google makes that decision is more sophisticated than all of that.

      • says

        Coming from the perspective of someone optimizing a website, conventional wisdom implies Google needs a certain amount of information to determine what a page is actually about in order to rank it properly. In other words, you need to have a reasonable amount of text.

        It could well be that Google don’t index that many tweets, because the 140 character limit restricts the ‘quality’ of the pages that are created.

        Given that you need one heck of a lot of followers before you start to see even some of your tweets being indexed, aside from the obvious benefits of reaching a lot of people quickly, it does seem that now is the time that we can turn round to people and say, “No, Twitter does not affect SEO.”

        Definitely not a reason to go and buy followers anyway.

      • Jon says

        Well… buying more followers is essentially buying more internal links to your account, even if they’re dud accounts.

  2. says

    Oh, and how rude of me. Great piece! Clearly a TON of work went into this. Already gave it one read through, but going to have to dive in again, as I am sure there is a lot I missed or glossed over.

  3. says

    Eric,
    Good exercise and good story about tweets indexing and their capacity to influence or not rankings. Not conclusive though, in my very personal opinion.

    Have you thought at any time that Google may be using a different criteria (qualitative) in order to honor rankings from social platform participation? Along the lines of agent and author? Just a thought. If my suspicions (not data backed) are somewhat certain, then these aspects cannot be observed with the methodology above.

    I’m interested in knowing what conclusions IMEC Lab (http://moz.com/rand/imec-lab/) is driving off their own experiments, and probably compare/complement with this on of yours.

    Thanks for taking the initiative to do this and sharing.

    • Eric Enge says

      Given the low indexation of the tweets, I doubt that those links are of any SEO value. In addition, Twitter marks them as NoFollow.

  4. says

    Nice study! I expect part of the low rate of indexing is due to low search volume (few people searching for specific tweets on Google) and low monitization (few of them click ads).

  5. says

    As I have pointed out for years, Google’s date-range queries do NOT provide complete results for things indexed during the given period. This is an admirable amount of work but you’re piling chompy numbers on top of chompy numbers.

    I think a smaller scale experiment would provide better insight into what google is doing. It would also enable you to check for secondary factors.

    In my own research I have found that Google is following external links to the Tweets. They can either follow links to the accounts (more links drive more crawling) or they can follow links to the status pages themselves. Poorly linked user accounts are less likely to have their Tweets indexed regardless of activity.

    • Eric Enge says

      Hi Michael – the actual data on the 19,000 tweets is measured through direct examination as to whether or not the tweets are indexed, so that’s hard core data. As I noted in the beginning, the site: queries numbers are not the crux of the work done. Cheers,

  6. Dan says

    A highly influential twitter account will have a lot of followers as proven above. However it is likely they will have a high number of links to their twitter profile too.

    Do you think google treats the number of links to a twitter profile as the influence factor rather than the number of followers it has?

    • Eric Enge says

      Hi Dan – Actually, in the study, we did examine whether large number of followers had more impact than links to the profile. Based on our analysis the number of links did not seem to correlate as well with indexation as the number of followers.

      However, for that part of the study, as I mention in the post, it was a pretty small data set.

  7. Julian Hoffmann says

    Hi Eric,
    before I ask you some questions – great article, really good work! I really liked to read it. Your article has been the source for mine in our german blog: http://seo.at/google-indexierung-bei-twitter-nur-die-creme-de-la-creme/

    One of my readers commented on my article and showed me the indexation of his twitter account. He linked his account with relevant social media (Google+, Linkedin), always used the Google Link Shortener, has a tidy follower-list and tweets always regular (every day or second day). With following these rules he has a lot of tweets indexed by Google, with only 97 follower. Are these personal rules of him maybe another sign for Google to index his tweets?
    I look forward to hear from you!
    Regards
    Julian

  8. says

    Great post, thanks Eric. And overall, I’d say looks like Google’s getting it about right… indexing news from people that more people are following and ignoring (all-but) images, mentions and the rest of the world that are (largely) tweeting conversations about right-now.

    PS Full marks for such a patient response to queries & points; maybe we need another abbreviation – gb;r-r.

  9. Micah Greene says

    Conflating indexation with results displayed is probably a mistake. Google indexes TONS of content that they do not display in the SERPs. Google is probably keeping millions of times more data in their index than they will ever actually display in results. This study is about results displayed, not indexation.

    Just because a page(or tweet) isn’t displayed in results, doesn’t mean it’s not impacting the algorithm elsewhere. A link on a page that you can’t find in the SERPs can still pass PageRank. We can actually prove this by using the meta=”robots” content =”follow, noindex” tag. The links on the page with that tag will pass PageRank, but the page itself will not show in the SERPs.

    • Eric Enge says

      Hi Micah – actually, we tested indexation of the posts by taking the URL and doing an info:query to see if Google had it in it’s index. The info: query is a straight indexation test, not a display test. However, it’s certainly possible that Google crawled pages that are not in it’s index, and it’s certainly possible that they might use links in such tweets for purposes of discovery, even if they chose not to index the content.

  10. says

    Eric- looks like I’m a bit late to the party on commenting (explains the date ranges on your graphs).

    That said, thanks f or a fascinating article! I’ve been at a juncture lately trying to process/justify the value of social to strictly SEO for search engines.

    I wonder if Twitter will start selling access to its firehouse. At the time of my commenting, its stock price (albeit short history) is not in a pretty place right now.

Leave a Reply

Your email address will not be published. Required fields are marked *

*