Jan. 17, 2019 By
In today’s post, I’ll share the results from the fourth of our “Links as a Ranking Factor” studies. We conducted the first of these studies in May 2016 and have been tracking the same query set over time to measure any material shifts in the role of links. In this year’s study, we also looked at different market sectors to see how the role of links may vary by market sector.
We have also increased the number of queries we’re examining over time. We did that to make sure that we had enough data for the market sector analyses to be meaningful. The breakout of the month for each of our studies, and the number of queries examined per study, is as follows:
- May 2016 – 6K queries
- Aug 2016 – 16K queries
- May 2017 – 16K queries
- August 2018 – 27K queries
Each of the query data sets includes the original query sets from the earlier studies, so I’ll show an apples-to-apples comparison of those results, as well as the larger-scale results from this year’s study. As a bonus, I’ll also comment on the increase in scope and quality of Moz’ Link Explorer.
As with our prior studies, we received the gracious support of Moz by allowing us to access Link Explorer to pull the data for our study. Link Explorer went into Beta in March 2018 and represents an ambitious effort by Moz to expand the size of their index. In short, it looks like they succeeded:
Number of Links to a Page as Ranking Factor
For the link study itself, the first set of charts that we will look at are based on the total number of links pointing to the ranking page. For these, we calculated the Quadratic Mean Correlation score. Jump down to the methodology section to see what a “Quadratic Mean Spearman Correlation Score” value actually means. Here is a look at that data for 6K queries across all four instances of the study that we’ve run to date:
Note that the same 6,000 queries for this chart were used in all four data sets. While this looks like it shows some level of decline, the reality is that this movement is within normal statistical variance. For all intents and purposes, this already shows a strong correlation between total links to its page and its ranking.
Beginning with the second study, we upped the query count to 16,000 queries. We carried that same set of 16K queries through to this year’s edition of the study. Here are the correlation scores for those three datasets of 16K queries:
Once again, all three sets show strong results, and the variance is within normal ranges of statistical variance.
In this latest version of the study, we updated the query count to 27K queries. This comes in at a solid value as well:
New study shows number of links to a page remains highly correlated with ranking.Click To Tweet
One of the more notable findings is that for the first time in all the studies that we’ve done, we see that the Moz DA and the Moz PA are better predictors of ranking than the total link count! The data for this is as follows:
New study shows @moz DA and PA now even more highly correlated with ranking.Click To Tweet
Links to a Page as Ranking Factor by Query Type
As with prior studies, we compared the total link correlation for commercial and informational queries:
New study shows informational queries slightly more correlated with ranking than commercial queries.Click To Tweet
Links to a Page as Ranking Factor by Market Segments
Next up, in this year’s study, we evaluated how links might vary as a ranking factor across market segments. In this first view, let’s look at that for commercial queries, divided into Medical, Financial, Technology, and Other segments:
New study shows financial market segment benefits more from links than other segments for search ranking #seoClick To Tweet
This data shows that links are a much bigger ranking factor for financial queries then for other types of queries. Before we draw a final conclusion for that though, let’s also look at a sector analysis for informational queries:
Links as Ranking Factor Aggregated by Normalized Link Counts
Starting with the first study, we also aggregated the normalized link counts (see the methodology section below for an explanation of what that is) by ranking position.
The reason this view is important is that relevancy and quality are very large ranking factors, as they should be. In addition, there are many other factors such as Google’s need to show diversity in the SERPs (see the section titled “Why Aren’t the Non-Aggregated Correlation Values Higher?” for more detail on this). In the aggregated link analysis, we get a summarized view of the impact of links spread across a large array of search results. Here is what we saw looking at the 6K query set across all four studies:
Here is the data for the 16K query set across the last three studies:
Here is the data for the 27K query set for this latest study:
In summary, our aggregated view shows a very powerful correlation between links and ranking position.
How Do Google’s Major Ranking Factors Interact?
There are many reasons for wanting to understand this, but as relates to links as a ranking factor, they can help us understand why the non-aggregated Spearman correlation scores aren’t higher than they are.
There are two major reasons why this is the case, as follows:
1. Relevance and Content Quality are Big Factors: In its simplest form, if a web page is not relevant to a query, it shouldn’t rank. That is, of course, obvious, but the discussion is much more nuanced than that. To illustrate, let’s say we’ve got 10 pieces of content that are relevant enough to be considered for ranking for a query. Let’s further say that they have “relevance scores” (RS) like this:
This looks like it could be a pretty good ranking algo on the surface. We’re ranking the most relevant content on top. You’ll notice too that I set my relevance scores in a really narrow range, and that makes sense. In short, if you’re not relevant, you shouldn’t rank, regardless of the number of links pointing to your page or site.
The problem is that it’s pretty easy to make pieces of content look highly relevant just by stuffing the right words in it, and the most relevant content may in fact be giving very poor information to users. So let’s add a new score called quality score (QS) (not the Adwords version of this term, but an actual organic evaluation of a page’s quality instead), and let’s see how that impacts our algorithm:
This appears to be an improvement, and it probably is. The problem here is that, as with measuring relevance, measuring quality is a difficult thing to do. So let’s add one more element to the mix, that of a Link Score (LS), and leverage that to let the “marketplace at large” give us an indication on what content is the best on this topic. Here is what that looks like:
You see how the rankings shifted around between the three scenarios? Pretty substantially. In this simplistic mock-up of the Google algorithm, it’s pretty clear that links are VERY important. Want to know the Spearman correlation for links as a ranking factor in the third scenario shown? It’s 0.28.
This is a major simplification of the Google algorithm, but even based on this, you can see why very high Spearman scores are hard to achieve.
2. Other Algorithms Come Into Play: Examples of the type of algorithms I’m talking about include:
- Local search results
- Image results
- Video results
- Query Deserves Diversity results
These might impact 15 percent of our results. To illustrate, let’s imagine that we take our above example, and use Query Deserves Diversity to change only one of the ranking positions, and one high scoring result is replaced with a single item (result number 3).
In our example, I’ve replaced the third result with something that came in from a different algorithm (such as Query Deserves Diversity). Know what this does to the Spearman score for links as a ranking factor for this result? It drops to 0.03. Ouch!
Hopefully, this will give you some intuition as to why a score in the 0.3 range is an indication that links are a very material factor in ranking.
Google’s Progress in Fighting Link Spam
Those who say that links are on the decline as a ranking factor often point to the efforts by spammers to use illegitimate practices to acquire links and earn rankings that their sites don’t deserve. They like to say that this is a battle the Google doesn’t want to have to fight anymore. This certainly was a huge problem for Google in the 2002 to 2013 time range. However, the tide in this battle started to turn in 2012.
What happened first during that year was that a wave of manual penalties started to get assessed by Google. By themselves, these already sent shock waves through the SEO industry. The next major step was the release of the first version of Penguin on April 24, 2012. This was a huge step forward for Google.
As the next few years unfolded, Google invested heavily in a mix of approaches to use new versions of Penguin and manual penalties to refine their approach to dealing with people that use illegitimate approaches to obtaining links. This culminated with the release of Penguin 4.0 on September 23, 2016.
With the release of Penguin 4.0, Google’s confidence in their approach to links had become so high that the Penguin algorithm was no longer punishing sites for obtaining bad links. As of Penguin 4.0, the algorithm simply identifies links it considers bad and ignores them (causes them to have no ranking value).
This shift from penalizing sites with bad links to simply discounting those links reflects Google’s confidence that Penguin is finding a very large percentage of the bad links that it’s designed to find.
Of course, they still use manual penalties to address types of illegitimate link-building practices that people use that Penguin is not targeted at addressing.
How much progress has Google actually made? I still remember the Black Hat / White Hat panel I sat on in December of 2008 at SES Chicago. With me were Dave Naylor, Todd Friesen and Doug Heil. A couple of the panel members argued that buying links at the beginning of campaigning for a website was a requirement, and it was irresponsible for an SEO pro to not do so.
How a decade changes things! It has been many years since any SEO in any venue has argued that buying links represents a smart practice. In fact, you can’t find anyone making public recommendations about methods for obtaining links that violate Google’s Webmaster guidelines. The entire industry for doing those type of things has been driven underground.
Driven underground is not the same as “gone,” but it does show that Google’s ability to find and detect problems has become quite effective.
One last point, and it’s an important one. Ask yourself, why does Google have the Penguin algorithm, and why do they assess manual link penalties?” The answer is simple: Because links ARE a major ranking factor, and schemes to obtain links that don’t fit their guidelines are things that they want to proactively address. Otherwise they would not need to invest in fighting link spam.
Why Are Links a Valuable Signal?
Why is Google still using links? Why don’t they simply switch to user engagement signals and social media signals? I won’t develop the entire reason why these signals are problematic here, but will share brief points about each:
- Social Media Signals: Two major reasons: (1) Google can’t be dependent on signals from third-party platforms that are run by their competitors (Google and Facebook are not friends); and: (2) Major social media sites such as Facebook and LinkedIn have stopped sharing data on likes and shares – if the social media sites themselves don’t find these signals valuable, why should a search engine?
- User Engagement Signals: Google probably finds some way to use these signals in one scenario or another, but there are limitations to what they can do. Here is what the head of their machine learning team, Jeff Dean, said about them: “An example of a messier reinforcement learning problem is perhaps trying to use it in what search results should I show. There’s a much broader set of search results I can show in response to different queries, and the reward signal is a little noisy. If a user looks at a search result and likes it or doesn’t like it, that’s not that obvious.”
But now, let’s get to the core of the issue: Why are links such a great signal? It comes down to three major points:
- Implementing links requires a material investment to be made by you. You must own a website and you must take the time to implement the link on a web page. This may not be a huge investment, but it’s significantly more effort than it is to implement a link in a social media post.
- When you implement a link, you are making a public endorsement identifying your brand with the web page that you’re linking to. In addition, it’s static. It sits there in an enduring manner. In contrast, with a link in a social media post, it’s gone from people’s feeds quickly, sometimes only in minutes.
- Now, here is the big one: When you implement a link on a page on your site, people might click on it and leave your site. In fact, you’re inviting them to do so.
Think about that last one for a few seconds more. A (non-advertisement) link on your site is an indication by you (as the publisher of the page with the links) that you think the link has enough value to your visitors, and will do enough to enhance your relationship with those visitors, that you’re willing to have people leave your site.
That’s what makes links an incredibly valuable signal.
After consulting with a couple of experts on the best approach (Paul Berger and Per Enge), I performed a calculation of the Spearman Correlation on the results for all the queries in our study, and then took the Quadratic Mean of those scores. The reason for doing this is that it leverages the square of the correlation variables (where the correlation value is R, the quadratic mean uses R squared).
It’s actually the R squared value that has some meaning in statistics. For example, if R is 0.8, then R squared is 0.64, and you can say about that 64 percent of the variability in Y is explained by X. As Paul Berger explained it to me, there is no meaningful sentence involving the correlation variable R, but R squared gives you something meaningful to say about the correlated relationship.
Here is a visual on how this calculation process works:
In addition to the different calculation approach, I also used a mix of different query types. We tested commercial head terms, commercial long tail terms and also informational queries. In fact, two-thirds of our queries were informational in nature.
I think that both the Mean of the Individual Correlations and Quadratic Mean approaches are valid, but one of the limits with these approaches is that other factors can dominate the ranking algorithm, and make it hard to see the strength of the signal.
For that reason, I chose to take some other approaches to the analysis as well. The first of these was to measure the links in a more aggregated manner. To do this, we normalized the quantity of links for each result. What I mean by this is that we took the link counts for each ranking position for a given query, and then divided it by the largest number of links for that given query.
As a result, the largest link score for each query would have a weight of “1”. The reason for doing this is to prevent a few queries that have some results with huge numbers of links from having excessive influence on the resulting calculations.
Then we took the total of the links for all the search results by each ranking position. The equations for this look more like this:
The value of this is that it smooths out the impact of the negative correlations in a different way. Think of it as smoothing out the impact of other ranking factors, as illustrated above (relevance score, content score, and the impact of other algos). This is the calculation that is shown in the “Aggregate Link Correlation by Ranking Position” data above.
I also looked at this one more way. In this view, I continued to use the normalized totals of the links, but grouped them in ranking groups of 10. I.e., I summed the normalized link totals for the top 10, did the same for ranking positions 11 to 20, 21 to 30, and so forth. I then calculated the correlations to see how they looked in terms of what it would take to rank in each 10 position block.
Those calculations looked more like this:
This gives us a bit more granular approach than simply aggregating all the ranking positions into the SERP positions, but still smooths out some of the limitations of the Mean of Individual Correlations method. That is what is shown in the “Aggregate Link Correlation in Blocks of 10” data above.
Cementing the Point with Case Studies
We do a lot of high-end content marketing campaigns with our clients, many of which are Fortune 500 companies. Here is a sampling of the results across many of our clients:
The sample results shown here have been repeated hundreds of times by us. However, we don’t find that links can rescue poor quality content, or cause low relevance content to rank. Also, all of our efforts focus on getting recognition from, or content published on, very high authority sites.
Doing this well requires a focus on how you implement your marketing and PR to get in front of the audiences that matter to your business the most. This will naturally drive high value links back to your site, and help you earn rankings that you deserve.