In Search of Spam

Posted by Richard Stokes on March 17, 2010 to Link Building, Research

With the launch of Link Insight a few days behind us, we have now begun to accumulate significant experience into what types of pages make up spam versus truly high-quality, trusted pages.

The insights we've gained have been surprising, to say the least (and based on my conversation with him this morning, I think I speak for Eric, as well.)

But first, I should share a little background about how our models detect spam pages.

When you pull a Link Insight report, each of the URLs will have a spam score assigned to it. This score starts at 0 and can climb as high as 30 or so. This has led some of our beta users to believe that a page with a spam score of 10 is exactly five times as bad as a page with a spam score of two. But that's not really how it works.

Link Insight is made up of dozens of models, each of which excels at detecting a particular type of spam. When one of these models detects spam on a page, it adds a point to the spam score. You can think of these points almost as votes, e.g. a spam score of 10 means that 10 different models voted that the page is spam.

So the score doesn't necessarily how "spammy" a page is, but rather, how confident we are that it is spam. With a spam score of 1, it's possible that you have a false positive. But a spam score of 2 or higher means that it's exceedingly likely that the page is of low quality.

Spam Reloaded Revolutions

Well, a funny thing happened while we were building Link Insight. Back in October, Eric braindumped most of his knowledge about what determines the quality of a backlink. The models which resulted from this were quite good at finding trust and spam links. But still, quite a few crept through. I would estimate that the initial model eliminated roughly 67% of all low quality links.

Now mind you, this was a huge leap forward. After all, most SEOs are stuck back in the 2003 mindset of pursuing every link available (and you might be surprised at how many people asked us during the beta why we weren't display all 300,000 - or even 3,000,000 links to their site. The answer is in our videos.) If you've ever engaged in link building, you know just how hard it can be. Having someone to come and eliminate 67% of your workload - with no loss of effectiveness - should be a pretty big deal. But still, we weren't happy with the false positives coming through.

So over the past month, we put in place a second set of rules designed to eliminate many of them. And these rules worked out pretty well, increasing our accuracy to around 71%.

However, one thing became pretty clear - we were heading uphill and the climb was only getting steeper. In the past two weeks, we've become aware of some very sophisticated link spam techniques. I have no doubt that some of these pages are actually getting counted by the engines. But I also have no doubt that somewhere in Mountainview, there is an engineer working out a way to defeat them (and he or she will, undoubtedly.)

However, we have limited resources and, for all practical purposes, unlimited internet. Something needed to be done and quick.

Our solution? I'll tell you in the next blog post. Stay tuned!

Update: we've implemented better measuring techniques and a larger sample set, so I've updated the percentages above with our latest (more accurate) figures.

Leave a comment

Did You Know?

Founded in 2004, AdGooroo is the original Search Engine Intelligence company. Our services help over 2,000 global advertisers excel in PPC, SEO, and Display Advertising.

Testimonials

“Based partially on your data, we have moved the site from a no-show, to 5th place, to #1 in a relatively crowded space in a couple of months.”
Scot Robnett
inSite Internet Solution

“Today, our use of AdGooroo tools sets us apart from most agencies”
Mike Lee
Director of Strategic Partnerships
DoubleClick Performics

Read More Testimonials

  • Forgot your password? Click here.

Get the Edge in Search Marketing
Follow @adgooroo on Twitter!