Does Backlink Quality Matter?

Posted by Richard Stokes on April 20, 2010 to Link Building

Backlinks — incoming links to a website or page — are a topic of primary importance to search engine marketers. Incoming links not only bring visitors. They also help search engines to measure the quality and authority of a particular page, which in turn plays an important role in determining which pages are served (and in what order) on the results pages for search queries.

The number of backlinks is considered by many search engine marketers to be of primary importance in this regard. However, many experts hold that backlinks vary in their ability to influence search engine rankings. In other words, a backlink from a high quality page is theoretically worth more than one from a low quality page.

There have been various efforts to model "link quality". One of the most well known is Brin and Page's Pagerank algorithm, in which they explicitly stated:

"PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page."

Other mathematical approaches include:

The PageRank Citation Ranking: Bringing Order to the Web
Authoritative Sources in a Hyperlinked Environment [PDF]
Hilltop: A Search Engine based on Expert Documents
Combating Web Spam with TrustRank [PDF]

While these techniques have certainly been steps in the right direction, there is no denying the evidence that linkspammers are still able to rank well for desired terms. This has led to a widespread belief by some search practitioners that high quality pages can be easily faked. This, in turn, has led to a cottage industry which churns out splogs (fake blogs), comment spam, forum spam, and other various linking schemes with the intent of working around the well-known link scoring algorithms cited above.

It is our belief based on years of experience that the search engines - while far from perfect - have progressed farther than is widely believed in their ability to identify and discount spam links.

If true, then there should be some practical way to measure this effect. In April, 2010 we set out to determine whether high quality backlinks have more influence on search engine rankings than low quality backlinks. The answer has profound implications for search marketers. For every worthwhile backlink, there are thousands of low quality ones. Those who blindly pursue every available link without regard to quality incur high - and unnecessary - costs in doing so (even while raising the risk profile of their site.)

The parameters of our study was as follows:

  • We recorded the top 1,000 organic rankings on Google US for each of 1,056 websites.
  • These rankings were combined with current keyword search traffic estimates as well as standardized clickthrough rates to generate a measure of each site's prominence in the organic search rankings. This measure is known as Domain Strength. Domain Strength is measured on a logarithmic scale from 0 to 100, where 0 corresponds to 100 visits per month while 100 corresponds to 100 million visits per month.
  • We then crawled a sample of 1,353,307 backlinks to these domains and assessed each on four scales: Trust, Spam, Social, Geo, and Other. The Trust and Spam scales measure high and low quality links respectively. Social signals measure links from popular social networking sites. Geo signals represent links originating from non-US domains (such as .com and .us) All remaining links are grouped in the "other" category.
  • Finally, we mathematically analyzed the correlation between these different signal types and the prominence of each site.

Backlink Authority Study Results

The results of our study show that trust and spam links differ dramatically in their ability to influence search engine rankings. Across sites of all sizes, authority, and verticals, a trust link carries a minimum of 4.2 times more weight than a spam link. This estimate is a lower bound. In practice, a trust link is worth far more, especially for sites who have a well-established number of inbound backlinks.

We were surprised to learn that links from social networking sites have an even higher ability to influence search engine rankings. A social link carries 8.9 times more weight than a spam link and 2.1 times more weight than a trust link. However, this effect appears to diminish with higher traffic sites.

It is also important to note that links from social networking sites are transitory by nature. While they can boost search engine rankings, the effect is often short lived as the links age and disappear. In this respect, trust links are far more useful in that they appear to convey a lasting, long-term benefit to their target sites.

We also noted that spam links carried about the same weight as links that did not rate highly on any of the other measures ("Other" links). The difference between these types of links lies primarily in their risk profile. While both types of link carry similar weight, spam links are very likely to be discounted at some future date by the search engines. When this happens, sites who rely heavily on them will likely experience a significant reduction in rankings (and traffic).

Table of results

The following table shows how each signal type correlates to Domain Strength (a measure of a site's prominence in search engine rankings.) P-Value is a measure of the statistical validity of a particular variable. Simplistically speaking, a p-value of less than .01 indicates high confidence.

Signal Type Correlation to Domain Strength P-Value
Trust .070332 .000000162
Spam .016889 8.3e-22
Social .150165 .000358
Other .018802 9.54e-44

Limitations

  • While generally true for all sites, the exact ratios and weights of various types of links vary enormously depending on the size of the site and the total number of pre-existing inbound links that a site may have. In particular, we noticed that spammy links tend to benefit small sites far more than large ones and that the law of diminishing returns is in effect.
  • The techniques which we use to score links in the various categories are obviously proprietary (e.g. they comprise the fundamental algorithm behind our backlinks product.) Specifically, our spam scoring model is influenced heavily by Quality Bot, a learning algorithm which was designed to detect pages which search engine users would deem as "low quality". Nevertheless, our models are based in no small part upon well-known information retrieval spam detection algorithms.
  • Due to the US-centric nature of our sample set, we were unable to measure the effect of "Geo" links.
  • Relevance of a backlink to its target page was not taken into consideration. Backlinks to topically similar pages almost certainly have higher weights, but we did not attempt to measure this.
  • The ratio of trust to spam is much higher in the real-world due to the fact that our algorithms filter the majority of spam links from consideration prior to crawling and scoring. If these links were included in the study, the influence of any single spam link would have been considerably lower.


Comments (3)

Mike

December 22, 2010 6:11 PM

"We then crawled a sample of 1,353,307 backlinks to these domains and assessed each on four scales"

No you didn't.

You wrote an algorithm that attempted to assess each link. The degree to which that assessment would correspond to a human assessment is unknown.

Posted by Mike | Reply to this comment

Richard Stokes

February 16, 2011 7:11 PM

Actually, that's a somewhat subtle point... and it would be entirely valid except for one thing: we actually enlisted thousands of human volunteers around the world to subjectively rate pages for us to help with the construction of the various signals.

QBot is a essentially a model which attempts to assess the quality of content on a page based on the learnings from this panel. We call this "semantic scoring". The degree to which the automated assessment corresponds to human assessment is very well known (to six decimal places in fact) and so we are able to draw these conclusions with a high degree of confidence.

That said, the "proof is in the pudding" as they say. Link Insight v3 is about to be released and one of the coolest features IMO is the ability to see some of the reasons why we discounted particular pages as spam. I've been playing around with this for a couple of months now and I'm consistently impressed with the pages it correctly identifies as spam that might otherwise escape casual inspection. The "content page" problem that Eric wrote about today for instance, is far more widespread than one might think.

Thanks for your comment, always happy to hear the feedback, however blunt it may seem ;-)

Posted by Richard Stokes in reply to Mike's comment | Reply to this comment

ewan watt

May 2, 2012 9:06 AM

would be interested to see results of similar study post penguin update... great post

Posted by ewan watt | Reply to this comment

Leave a comment

Did You Know?

Founded in 2004, AdGooroo is the original Search Engine Intelligence company. Our services help over 2,000 global advertisers excel in PPC, SEO, and Display Advertising.

Testimonials

“Based partially on your data, we have moved the site from a no-show, to 5th place, to #1 in a relatively crowded space in a couple of months.”
Scot Robnett
inSite Internet Solution

“Today, our use of AdGooroo tools sets us apart from most agencies”
Mike Lee
Director of Strategic Partnerships
DoubleClick Performics

Read More Testimonials

  • Forgot your password? Click here.

Get the Edge in Search Marketing
Follow @adgooroo on Twitter!