A Comprehensive Guide to Defending Against Google Referrer Spam

By: Dave Nicosia

February 7, 2017

So what Is Google Referrer Spam and how do you combat it?

Google referrer spam or fake traffic is when hackers and spammers manipulate the results with fake data that you get from your Google Analytics. This makes your analytics results inaccurate and decreases their usefulness in understanding your site’s performance.

So basically, what this means, is that the numbers and information that you’re getting from Google Analytics, could be fake data. You could be getting inflated results because someone is trying to get spam to you via your analytics account. This prevents you from being able to see where problems may or may not be on your site. If you don’t see the problems, you cannot fix them.

There are actually two ways a spammer can fake traffic on your website.

One is through ‘bots.’ This is the most well known way. A hacker creates a program that runs as though it is a human on your website. The program goes into the files and database, grabbing copy, images, products, and even sensitive information such as passwords or debit card numbers. It can also fill out forms and leave comments. Most legitimate web hosting companies (such as WP Engine, Cloudflare, Digital Ocean, etc) protect against this. 
analyticssm

However, another way your analytics can be affected is by bots that hit the Google Analytics servers directly. Google allows developers to make direct requests to the Google Analytics servers using the Measurement Protocol.

When this protocol is abused, the ‘bots’ completely bypass your site and send incorrect data directly to your Google Analytics account. This does not compromise your website security, but it does cause inaccurate reports on website traffic. This is known as ghost traffic.

Another result of this protocol is the potential to click a link that has some kind of computer virus hidden in it. At this time, most of this kind of spam places a live link in place of information that Google Analytics provides to you such as the language setting (usually something like “en-us” which means English from the United States).

Who is this going to affect most?

This method of spamming is going to be most problematic on small to medium sized sites.

If you have a large site with high traffic, the percentage of spam traffic that you get will likely be fairly small, and have minimal impact on your analytics. However, following the tips in this article definitely will not hurt your data, no matter what size site you have.

Why would somebody send your Google Analytics account fake data?

The main reason why this is such a popular form of spam currently is that it’s an easy way to get the spam links to you without automatic filtering. Google hasn’t addressed the problem in their system yet, so it’s up to the user to figure out how to address this.

While this kind of spamming has been used previously, it has become very popular within the past four months. A hacker from Russia popularized it in late October of 2016 so that he could spread his opinions on America’s Presidential Election in November 2016. For more information about the root of this kind of spam, read this article.

Some of the popular names in the spam include buttons-for-websites and semalt.

The numbers and information that you’re getting from Google Analytics could be fake data

So now that we’ve addressed what this kind of spam exactly is, as well as where it came from, we come to the important part; how to stop it from corrupting your data.

One very important note on how this process works: In most cases you are not going to be able to filter your old data. This method will work on future reports, but not past ones. However, you can set up special segments to look at old date through your new filters. Click here for instructions on how to set up segments related to referral spam. This guide also has a great explanation for creating segments.

Here’s How to Combat Referral Spam in your Google Analytics Account

If you have noticed that your data seems strangely inflated, is coming from dubious sources, or just looks inaccurate, follow the steps below to set up your own view and referral spam filters to clean up your Google Analytics Data.

Step 1: Creating a New View

The very first thing you should do in order to start properly filtering spam out of your data is to make a new view in Google analytics.

You should NEVER filter your raw data. If you make an error on a filter of some kind, that data will never come back. Making a new view for your data can also help you compare to see how much of your traffic comes from spam.

If you have a view that you use to filter your data, you can use that, but you should always test your filters on a test view before applying them on data you plan to use for reports.

To make a new view, you choose your site from the main home page, then click “Admin” in the top tab menus.

Audience Overview Analytics

 

Once you get to that page, go to the dropdown that is labeled “View” and click on it. At the bottom, it should say “Create new view.” Give your view a name, and click the “Create View” button.

Analytics2

 

Make sure your view has a descriptive name such as “Raw Data” for the unfiltered data and something like “Main Data” or “Filtered View” for your filtered view.

Analyticsview

 

You now have a new view!

Step 2: Creating Your Filters

So now, you should make sure you’ve switched into your new view using the “View” dropdown mentioned above.

Once you’re sure you’re in your view, you should go to that View section of the Admin menu, find “filters,” and click on it. This will bring you to the list of Filters that you have applied. In your new view, you probably don’t have any filters. That’s okay! We will be adding some.

Click the “+ Add Filter” button.

filtering

 

You will come to a screen that looks like this:

filterview

 

This is where you will create your first filter. First, you have to give your filter a name. And to give your filter a good, descriptive name, you should decide what you are going to be filtering first.

Language Spam Filter Tutorial

In this instance, the first filter we are going to create is called a Language Spam Filter. Basically, Language spam is when spammers send a live link instead of a language setting from a browser or computer (structured “en-us” which is English in the United States).

This filter will exclude traffic hits where the language settings has more than 12 characters. Legitimate language settings created by a browser usually has 5-6 characters. The maximum number of characters would be 8-9 and have any chance of being legitimate. So this filter should only remove those hits with spam in the language setting.

Again, you must give this filter a descriptive name. A good example would be Exclude Language Spam.

Once you’ve named your filter, click on the “custom” tab.

customfilter

 

Make sure “exclude” is selected. Then go to the dropdown labeled “Filter field” and select “Language Settings” under “Audience.”

filterfieldexclude

 

Once you’ve done that, you should copy the following text and paste it directly into the text field.

.{15,}|\s[^\s]*\s|\.|,|\!|\/

Click Save at the bottom.

And there you have it! Your first filter!

There are many filters you can add to your new view to help sift through the spam. You may never be able to get all of the spam, but you can get enough of it that it shouldn’t cause too many problems in your reports.

We are going to go through several more filters that can help you keep your data as accurate as possible for as long as possible.

Browser Spam Filter Tutorial

The next filter is going to be to filter out browser spam. Many spammers are now sending links in the browser setting. This filter does ignores any browser settings that has a domain-like string (ex: lifehackeR.com).

First, you need to get to the ‘New Filter’ page as described above. Then you give your filter a name. A good one for this would be “Exclude Browser Spam.”

Next select “Custom” filter, and make sure that “Exclude” is selected.

customfilter

 

Go to the “filter field” drop down menu and set it to “Browser.”

filterfieldexclude

 

Now copy the Regex following this sentence and paste it into the text field.

[^\s]{3,}\.[^\s]{2,}

Click Save at the bottom.

Operating System Spam Filter Tutorial

Now we are going to do a filter to remove Operating System Spam. This filter does the exact same thing as the Browser Spam filter, except it uses the “Operating System” filter field. Essentially, it removes entries that have links instead of valid Operating System names.

First, you need to get to the ‘New Filter’ page as described above. Then you give your filter a name. A good one for this would be “Exclude Operating System Spam.”

Next select “Custom” filter, and make sure that “Exclude” is selected.

customfilter

 

Go to the “filter field” drop down menu and set it to “Operating System.”

filterfieldexclude

 

Now copy the Regex following this sentence and paste it into the text field.

[^\s]{3,}\.[^\s]{2,}

Click Save at the bottom.

Browser Version Spam Filter Tutorial

The next filter is for Browser Version Spam. Browser versions have a very rigid structure, and an Include Only filter can cover all of the main ones, and essentially all of the very exotic ones such as Amazon Silk (used on Kindles). It can also include places where the browser version is not set.

It does not allow for the Nintendo Browser, which includes devices like the Wii or the DS handheld system. That does not mean that they cannot visit your site, it means that you will not see that traffic.

This filter is one that you can decide one way or another whether you want to use it. Check your traffic. If you have a decent portion of your traffic coming from the Nintendo browser, then do not use this filter. However, even on very high traffic sites only have one or two visits per month from this browser, so excluding it will not skew your results significantly.

If you decide to do add this filter, get to the ‘New Filter’ page as described above. Then you give your filter a name. A good one for this would be “Include Only Valid Browser Versions.”

Next select “Custom” filter, and make sure that “Include” is selected. This one is not selected by default so be careful here.

includefilter

 

Go to the “filter field” drop down menu and set it to “Browser Version.”

filtertypeinclude

 

Now copy the Regex following this sentence and paste it into the text field.

(^(([0-9]+\.)+[0-9]+|[-_a-z0-9\+\s]+)+|^\(not set\)|^)$

Click Save at the bottom.

Screen Resolution Spam Filter Tutorial

The next filter is one that will only include valid screen resolutions. Screen resolutions come in a very set format, so it’s a very safe bet to use an include only filter rather than an exclude. This filter will also allow for data where the screen resolution is not set.

First, get to the ‘New Filter’ page as described above. Then you give your filter a name. A good one for this would be “Include Only Valid Screen Resolutions.”

Next select “Custom” filter, and make sure that “Include” is selected. This one is not selected by default so be careful here.

includefilter

 

Go to the “filter field” drop down menu and set it to “Screen Resolution.”

filtertypeinclude

 

Now copy the Regex following this sentence and paste it into the text field.

(^[0-9]+x[0-9]+|^\(not set\)|^)$

Click Save at the bottom.

Viewport Size Spam Filter Tutorial

Lastly, we are going to create a similar filter for Viewport size, or Browser size. This is in the same format as Screen Resolution.

First, get to the ‘New Filter’ page as described above. Then you give your filter a name. A good one for this would be “Include Only Valid Browser Sizes.”

Next select “Custom” filter, and make sure that “Include” is selected. This one is not selected by default so be careful here.

includefilter

 

Go to the “filter field” drop down menu and set it to “Browser Size.”

filtertypeinclude

 

Now copy the Regex following this sentence and paste it into the text field.

(^[0-9]+x[0-9]+|^\(not set\)|^)$

Click Save at the bottom.

For the foreseeable future, these filters should help protect your analytics against the most common types of referrer spam.

One major caveat about these filters is that spammers are clever, and are always coming up with new things. Spam prevention tries very hard to be proactive, but it tends to be reactive in general since no one truly knows what form of spam will come next.

Spam is prolific, and there is always more in the works. Keep an eye on your analytics, watch out for things that seem wrong or inaccurate, and try to keep informed about the most common and popular types of spam.

One last very important warning. NEVER click on the spam links. Just because it looks like a reputable site, doesn’t mean it is. Thanks to a new initiative that allows non ASCII characters in domain names, you can have a domain that looks like Google.com or Lifehacker.com but has a letter switched out with a special character that looks very similar to one of the letters.

Following these tips, as well as including these filters in your analytics, should help create accurate reports to help you understand how visitors use your site and what you can do to make sure you are achieving the goals you set when you built your site.