Removing Referral Spam from Analytics

Share on Google+Share on LinkedInTweet about this on TwitterShare on StumbleUponShare on Facebook

 

Over the last year, I’ve noticed a significant amount of instances in which referrals are showing up in client reports which have 100% bounce rate, and don’t log any time on the site.

These are the actions of referral spammers. Bots are sent to your site in the interests of showing up as a referral. They are often executed using a wide range of IPs from all sorts of locations, and aren’t always easy to track down.

You can see your referral traffic simply by visiting analytics and going through acquisition, all traffic and then clicking referrals.

referral traffic in analytics

In the above, semalt and buttons-for-website are using botnets to land on my site and then leave. Google counts these as a referral. They can be easily seen from the session duration and bounce rate.

 

Who is Semalt? Why do they keep showing up in analytics?

The point of referral spam is initially to create backlinks to a site, from the user’s statistics. Crawlers and spiders, like the one used by Screaming Frog, don’t usually show up in Analytics. Semalt and buttons-for-website do this presumably under the assumption that you’re going to think ‘what the heck is this site sending me so much traffic?’ and pay them a visit. And then, you’ll see what awesome services they claim to provide and invest in access to them.

Semalt is one particularly interesting case, not least because they do claim to be a legitimate SEO tool. The damage it may do to your site isn’t well proven, although there is plenty of discussion about it. The point is, it does completely skewer your statistics.

 

Is the bounce rate damaging?

Bounces from the organic SERPs are often cited as a key thing to avoid when producing an SEO-friendly site. Bounces from a referral add up to your total site bounce rate, which means Semalt could be accounting for a high overall bounce rate.

Matt Cutts has said – back in 2012 – that bounce rates in Analytics don’t actually count towards ranking. I haven’t seen any information regarding whether referral bounces harm a site; as far as I know, bounces back to organic searches are the only thing that need worrying about.

But still, Semalt is annoying and frankly pretty intrusive. So let’s get them out of our data.

 

How to remove Semalt from Analytics

Despite claiming to be a legitimate SEO tool, Semalt does some pretty blackhat things. Like totally ignoring robots.txt directives.

Chucking their crawler name in your robot.txt wont get rid of it. They also have this removal tool on their site which doesn’t actually work. There’s also been stories of people’s analytics stopping tracking entirely after using the tool, and anyone downloading their SEO tool might be infecting themselves with a ‘soundfrost’ virus, possibly with the intention that vulnerable sites can be offered up to spammers for link and email spamming campaigns.

The first method is using exclusion rules in Analytics. Click on ‘Admin’ in the top nav of your Analytics reporting suite, and click ‘filters’ in the View column to add a new filter. Check ‘custom’ and exclude ‘Referral’ in the Filter Field.

exclude referral traffic

 

This only applies to semalt.com.  You’ll probably need to do it again for crawler.semalt.com and semalt.semalt.com. It won’t wipe out any historical data.

Viewing Analytics without referral spam

Custom Segments are a good way of seeing how your site has been performing historically without counting the rubbish that referrer spam sent your way.

Go to ‘Reporting’ in the top nav, and go to your Audience Overview tab. From here, click on ‘+ Add Segment’ and click the big red ’+Add Segment’ button. Choose ‘Traffic Sources’, and now apply the filter to only include referrers which don’t have ‘semalt.com’ in the name.

filter spam referrals

Using htaccess to block Semalt and spambots

If you want to future-proof your site from Semalt’s endless supply of referral source names, you can drop code into your htaccess file and use regex to exclude all of their domains.

[note: Semalt has recently started using other addresses not including their brand name. It’s not much use blocking their IPs because they have so many of them, so I’m afraid at the moment it looks like you will have to block those ones manually too.]

block spam htaccess

Or block a bunch of identified spammers like this:

block spam in htaccess

ALWAYS save a copy of your database before editing your htaccess file. 

 

Written by Sarah Chalk

Sarah Chalk

Sarah is an SEO Account Manager at 360i and has a keen interest in all things SEO. She has also written for a number of sites, including Vue cinema’s film blog and a number of tech websites.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>