Introducing QualityBot

Posted by Richard Stokes on March 25, 2010 to Features, Link Building

In our previous blog post, "In Search of Spam", I talked about how we were facing an uphill battle in detecting various types of spam pages that litter the internet.

This, incidentally, is a parallel battle that Google and the other search engines face. The engines have no interest in indexing these pages nor valuing links from them. And we have no interest in associating our companies and brand names with those pages either.

So how do we go about identifying "spammy" pages? Obviously, the best approach is to manually look at each page and decide whether or not it is a worthy place to promote ourselves. Humans are very good at this, but unfortunately, people don't scale well.

The next logical approach is to devise a set of automated rules that can easily identify certain types of spam. For instance, we know that a page containing the name of a certain male pharmaceutical as well as a popular casino game is pretty likely to be spam (sorry to be so abstruse, but the challenge when writing about spam is that you have to somehow tiptoe around the very thing you're talking about.)

There are many different types of rules such as this. While this does scale in terms of scoring large quantities of backlinks, it suffers from two other shortcomings.

First, hard-coded rules are just too brittle. Over the past month, I've seen more spam than any one person should ever have to endure. One thing I've learned is that spammers are a very clever bunch. It is one thing to identify a link farm with dozens (or hundreds) of low-quality backlinks. It is quite another to teach a machine to identify what appears to be a high-quality page, when in fact the quality is just a cover for one or two very nefarious backlinks.

Second, the "rules approach" puts you in an arms race that you cannot possibly win. True - a few simple rules can identify a great deal of spam. A few dozen more, and you may get up to a majority. But to make further progress requires (quite literally) thousands of rules. Neither Eric nor I have the kind of time on our hands.

Enter the AdGooroo QualityBot.

You cannot manually score large numbers of pages by hand, but you can solve this with a computer. And you cannot devise thousands of rules to catch spam... but you can teach a machine to do this as well.

QualityBot is an artificial intelligence algorithm that has been taught not only to score pages, but to learn from its mistakes and devise better rules. It has now passed the 80th round of training (what we call a generation) and is demonstrating a marked superiority to humans both in scoring pages as well as devising new rules for identifying spam.

Our best "expert-trained" scoring system to date (the one which is currently serving as the engine behind our quality backlinks product, Link Insight) is capable of identifying 67% of all spam pages (not bad).

Generation 46 of QualityBot was able to eliminate 75.8% of all low quality pages. Generation 78 (which is the latest and greatest) is up to 81.8% accuracy.

We (probably) won't ever get to 100% of course. But even if we don't, think about the impact of this versus the efficiency of a typical outreach program. We can now eliminate 82% of the links which have no effect - or worse, can seriously damage your internet presence. And to do so, we no longer need to rely on the usual hocus pocus (page rank, page length, keyword density, link ratios, and so forth.)

What's more, QualityBot evolves over time. A splogger uploads his content to a throwaway WordPress blog and maybe slips under the radar. But as soon as he uses Ping.fm or some other RSS feed to spread his content - even if he changes it slightly - QualityBot begins to adapt. In short, it responds much the same way as a search engine does (although probably a little faster, as we've trained it to adapt somewhat more aggressively than a search engine typically will.)

What is QualityBot Looking For?

In short, the same types of things you do when you visit a page. QualityBot is great at identifying various types of linkspam, such as link farms and splogs. Misspellings and capitalization errors are also dead giveaways of spam, as are the usual blog comments and what not.

We know that it looks at common bigrams (two word phrases, such as "w** hos***g") and trigrams. And we also know that it doesn't care too much about the contents of title or H1-H3 tags or the domains that a page points to. None of these things seems to correlate very highly with real-world patterns of linkspam.

It seems to like pages about science and economics, but Craigslist puts it on the defensive (as do pages about gold, condoms, and lasers.) But these are all just generalizations.

The truth is, we can't really state with much confidence about the rules it follows... because there are none. Instead, QualityBot is constantly breaking apart and reformulating thousands of rules in real-time based on what its seeing. It is perfectly capable of distinguishing between pages on sculpture and body sculpting, handicaps and handicapping, or finance and refinance. It also is very good at identifying the difference between a page written by someone who is avidly interested in a particular topic versus a marketer who is just pretending to be interested. This is something that anyone with a little practice or training could do, by the way. The remarkable thing is that QualityBot can figure it out by itself and grade millions of pages a day without breaking a sweat or getting bored.

We will quietly be slipping QualityBot into production in the next week or two. We'd love to see if you notice a difference!

Leave a comment

Did You Know?

Founded in 2004, AdGooroo is the original Search Engine Intelligence company. Our services help over 2,000 global advertisers excel in PPC, SEO, and Display Advertising.

Testimonials

“Based partially on your data, we have moved the site from a no-show, to 5th place, to #1 in a relatively crowded space in a couple of months.”
Scot Robnett
inSite Internet Solution

“Today, our use of AdGooroo tools sets us apart from most agencies”
Mike Lee
Director of Strategic Partnerships
DoubleClick Performics

Read More Testimonials

  • Forgot your password? Click here.

Get the Edge in Search Marketing
Follow @adgooroo on Twitter!