Has our beloved (or more often, feared) Penguin come in from the cold?
On September 23, 2016, after an over-two-year wait, Google finally released the much-anticipated Penguin 4.0 update. They had promised that this update would be different from all previous updates, and it’s looking like that promise was kept. Not only that, but 4.0 would be “the update to end all updates,” at least from a public view.
But before we dig into how much of a sea change this new Penguin might be, let’s review what the previous updates were like, if only so we can evaluate how much better 4.0 might be. (If you want to skip straight to my analysis of Penguin 4.0, click here.)
Penguin in the Past
Penguin was first launched in April 2012, with the primary mission of combating link spam at scale. In the early days of the new web economy, links had become valuable commodities, and were bought, sold, and traded almost with impunity. Such link schemes threatened to undermine Google’s entire business, which still considers links to be a strong ranking factor. Anything that threatens that assumption can negatively impact the value of Google’s search results for users.
Penguin was designed as an algorithm that would allow Google to detect and penalize sites that appeared to be engaging in manipulative link schemes at a much larger scale than ever before. As such, it needed to be updated and improved over time. Here is a list of the major known Penguin updates before 4.0:
- (1.0) April 24, 2012
- May 25, 2012
- October 5, 2012
- (2.0) May 22, 2013
- October 4, 2013
- (3.0) October to December 2014
What was old Penguin like?
All of the Penguin updates up until 4.0 took the same basic approach, and shared certain characteristics.
For one thing, old Penguin evaluated the link profile of sites as a whole, and if too many bad links were found, the entire site got a ranking penalty. For that reason, getting hit with Penguin almost always used to mean a huge traffic hit.
To make matters worse for site owners hit by Penguin, updates were very infrequent, and only at the time of an update could a site that had cleaned up its links experience any recovery. This made the final stretch of nearly two years from the Penguin 3.0 update almost unbearable for many site owners.
But Google told us that time was necessary, because they were cooking up a whole new Penguin. And indeed they were!
A New Penguin in Town
So what’s different about Penguin 4.0? Plenty! Google had promised that 4.0 would be “real time” and that once it was out, there would never again be an announced update. I’ll discuss the significance of both those statements below.
First, Penguin 4.0 adjusts the weighting of links to a site “on the fly.” That is, as Google’s bots crawl the web and discover links, the Penguin algorithm evaluates them and then stores away its judgment. That provides a reservoir of data from which Penguin can draw to make snap judgments on a web page.
Penguin 4 adjusts the weight of new links on the fly.Click To Tweet
Second, Penguin is now more “granular.” Google’s Gary Illyes said, “Penguin now devalues spam by adjusting ranking based on spam signals, rather than affecting ranking of the whole site.”
This means that when Penguin discounts links now, it only impacts the ranking of the particular page, or pages, on a site that stood to benefit from the spammy link(s), rather than the whole site.Penguin 4 is more granular, only affecting individual pages with bad backlinks.Click To Tweet
Third, Penguin is now “real time,” but what does that mean? It does not mean that ranking changes happen to pages instantaneously when new links connect to them. Other ranking changes based on links don’t work that way, and neither does Penguin 4.0. Here’s how it works:
Just as they always have, Google’s “spiders” crawl the web, discovering new links as they do. Google updates its database of link signals, but no ranking changes are applied yet.
When your page is recrawled by Google, link signals from any links to the page discovered since the last time Google crawled that page are applied, and new page ranking weights are calculated at that time. Now is when Penguin has its effect (if any needed). If any suspicious links were picked up from crawls of sites outside your own, Penguin has already devalued them, and at this point the consequences of that may affect the ranking of the page. Bear in mind, with the new Penguin, this means that they will have less positive impact
Penguin 4 is said to be real time.Click To Tweet
How Is Penguin 4.0 Real Time?
So in what way is Penguin 4.0 real time if ranking changes are only applied at the time of a recrawl? In at least two ways:
- Penguin 4.0 is “more real time” than previous Penguins because sites don’t have to wait until a major update or refresh of the algorithm to see effects (positive or negative). This is good news for sites with Penguin-penalized pages, as any successful remediation only has to wait for the next crawl (not up to two years as before!)
- Penguin 4.0 is continually updated “on the fly.” Changes and updates to the algorithm are now made without necessity of an entire update. These changes will be seamless and largely invisible to us.
From here on out, updates to the Penguin algorithm will:
- Address new link types
- Adjust ranking weights
- Improve the process of collecting link signals
So just as there is a slight delay between any new links to a page and the application of ranking signals from those links, there will be a slight lag between the roll out of any changes to the Penguin algorithm and their effect on a particular site.
Speculation: Disavow Files May Help Train Penguin
Allow me to indulge in a bit of speculation for a moment. It should be obvious that it took a lot of data analysis to bring Penguin to the point where it could be trusted as a “real time” part of the ranking algorithm. In my Virtual Keynote interview with Google Webmaster Trends Analyst Gary Illyes, Gary told me that the reason Penguin 4.0 was taking so long to release was a priority on “getting it right.” They had to be sure they had built the algorithm to be as accurate as possible. But where did the data come from to provide the training set for the update?
One possibility is that Google used, and continues to use, data from links disavowed by webmasters attempting to recover from a link-based penalty. Many believe that those links in aggregate provide a richly detailed portrait of what constitutes spammy links.
In my view, Google does NOT use disavow files in this manner. As a publisher, I have received so many idiotic link removal requests over the years. At one point, even Matt Cutts reported getting link removal requests for his own blog! Disavow files are probably even worse. As a result, I have no faith that disavow files would have usable data for assessing link quality. Garbage-in, garbage-out, as they say.Does Google use webmaster-submitted disavow files to train Penguin?Click To Tweet
How to Respond to Penguin 4.0
So what should you as a site owner do about Penguin 4.0? Nothing.
OK, I’m being a bit facetious there, but some of the major differences with Penguin 4.0 really do change the game. For one thing, since 4.0 only discounts links to your site, there is in principle no huge downside. It will be as if the links never existed in the first place.
Also, there is no way to file for a reconsideration of a Penguin penalty. Here is what Gary Illyes had to say:
Absolutely no reconsideration request can help you with Penguin. Reconsideration requests are for manual actions, and you can only file one if you have an incident filed internally. Penguin doesn’t create an incident; it never did and I’m very certain it never will. Reconsideration requests do not and will not help with Penguin.
So what can you really do?
- Earn and/or build better links. Since Penguin now simply devalues bad links instead of putting your page or site in jail, getting better quality links to the page should improve its rankings, just as always.
- Prune bad links regularly.
- Use tools such as Bing WMT, Open Site Explorer, Majestic, Ahrefs, and Google Search Console to get your link profile.
- Build a list of all the backlinks to your site.
- Categorize the link sources:
- Multi-link pages
- Rich anchor text
- Comment links
- Multi-link sites (e.g., directories)
- Analyze and identify potential bad links.
- Submit the bad links to Google’s Disavow Tool.
- Don’t forget about manual penalties. A large number of bad links or other practices that go against Google’s Webmaster Guidelines could result in a manual action against your site. If that happens, you’ll have a lot of hard work ahead of you to clean up the problems, and your site won’t have any hope of getting restored in rankings until your reconsideration request is approved. This is the best reason to continue to perform the regular link auditing recommended in the step above!
One more thing: What about negative SEO?
Ever since Google first introduced Penguin 1.0, webmasters have worried about the possibility of “negative SEO.” Negative SEO is the practice of intentionally pointing lots of bad links at a competitor’s site in the hope of triggering a penalty for them. Does this actually work? Once again, Gary Illyes:
So the thing about negative SEO is that, to this date, I haven’t seen a single…well, not just me, but also the ranking team hasn’t seen a single case where it was really negative SEO. It was more about clients not revealing details to the SEO who was doing the cleanup, for example.
Whether or not negative SEO ever was really a “thing,” it seems to me that Penguin 4.0 largely (if not completely) removes the SEO-related incentive for it. Any bad links you build to your competitor would simply be discounted. In other words, they do nothing, and the competitor’s page is no worse off than it was before.Does Penguin 4 make negative SEO a thing of the past?Click To Tweet
Penguin 4.0 really is a bird of a different feather. For most webmasters, it will fade into the misty seas of the overall ranking factors, and it won’t be worth any anxiety. On the other hand, wise site owners will follow the positive steps I outlined above, but then those should be part of any good SEO practice anyway.
Slide Deck Version of this post: