Fake visits bite.
You’re sitting there in front of your computer, all set to analyze another day’s worth of Google Analytics traffic, and BOOM!, tough actin’ Tinactin! Or wait… Boom! there’s a huge surge of traffic! How exciting!
At least, it’s very exciting until you engage in your usual breakdown of traffic and realize this sudden windfall of good fortune – unlike any sort of traffic numbers you’re used to seeing – is not at all what it seems.
The visits are all direct and (not set). The average time on site, percent of new visitors, and bounce rate figures are appalling. In truth, this apparent great day for your website might be the single most confusing thing to ever happen to your analytics data.
How do you break down what happened and prevent it from continuing?
Table of Contents
In general, there’s an assumption that all Google Analytics data is from legitimate human visitors around the globe. On occasion, though, spam bots can sneak through Google’s barriers.
It’s important when you see an anomalous surge like the above to ask yourself: Have I done anything on the site recently that justifies this surge?
It’s very possible that you have. Maybe it was an infographic you spent months on, or something as simple as an article retweeted by a celebrity or major corporation.
To validate the origin of these visits, my first recommendation is to view the ‘All Traffic’ section of your analytics (Traffic Sources –> Sources –> All Traffic).
From here, take a look at your top traffic sources and click on each to see where your surge is coming from. Again, what you want to do is identify the anomaly compared to your site’s previous data.
If it turns out that the spike in traffic is through a referral source like Twitter, or a major spike in organic traffic through a newly established keyword set, great – You’re done and congrats!
If it’s an impossibly high spike in direct traffic, though? Looks like your Google Analytics got spammed. This is going to require some more digging.
At first, this sort of direct traffic surge will make no sense. You’re not alone in the confusion.
You mean to tell me that hundreds of new visitors suddenly decided to type my URL directly into their browser? Or visit my site from a saved bookmark or RSS feed? What the H?
It’s a weird one, and 99% of the time it’s because this direct traffic is total BS.
The first thing I would do is narrow the data down to the day the spike occurred If the surge has been going on for some time, start with the first day you noticed it.
From there, go ahead and select ‘Landing Pages’ as your secondary dimension. This will let you see if this is a sitewide bot attack, or just a select few pages.
I’ve highlighted the most telling stats above. As you can see, these direct ‘visitors’ are hitting one page, immediately bouncing, and almost always registering as new visitors. Tell-tale bot behavior if I’ve ever seen it.
At this point, you should be about 90% sure your analytics got bot’d. There are a few more things we can deduce about this spam within GA.
One of the easiest ways to re-confirm the spam bot suspicion is by analyzing the location of this surge in traffic.
The simplest way to do this is to change the secondary dimension on your one-day direct traffic spike data to ‘City.’ If the vast majority of your shady looking direct visits are coming from the same city, guess what? Bot attack!
Personally, I prefer to analyze this data a little more visually. You can do this in analytics by clicking ‘Audience –> Demographics –> Location.’
Start clicking through the locations that contain the most visits. In my case, clicking ‘United States’ and then ‘California’ (which had about 90% of the day’s traffic) led me to the following insight.
What a Valentine’s Day treat – 77% of all my traffic arriving from Palo Alto, California! And look at the rest of those metrics. I won’t pretend my site is the world’s most revered, but a 99.42% bounce rate is a little strong.
And why the sudden interest from little old Palo Alto? Either my guerrilla marketing campaign in a city I’ve never thought about paid off, or… BOTS!
One quick important consideration here that will help you determine how to move forward: Does your site regularly get visits from the city in question? Expand your date range and see what kind of traffic you regularly get from this area. If you’re running a local website and the answer is “none, ever, and I don’t want anymore” it will make your solution significantly easier.
Bonus Bot Analysis
At this point, we should really be talking about solutions for dealing with this fake BS surge in traffic. But I have one more bit of Google Analytics diving that can offer some information.
Right now, we know that the website is recording bogus visits and we know where they’re coming from. What we don’t know yet is the who.
The first test here is to check out what domain all these visits are coming from. There are a handful of ways to do this, but the simplest is from the location analysis we just did. Add ‘Domain’ as your secondary dimension to all those visits from the single location in question.
If you get something other than ‘unknown.com,’ Google that thing and figure out what the source is! For example, a while back I noticed a client getting their surge from aws.com. A little bit of digging, and I found that aws could stand for Amazon Web Services (or Analytics Web Spammer).
In most instances I’ve seen of nonsense visits, the culprit is some sort of hosting/web services software.
If you do get the dreaded ‘unknown.com,’ there’s one more section to check: Audience –> Technology –> Networks.
From the graph above you’ll likely be able to verify the network generating this surge. In my example, the connection is abundantly clear – My friends in Palo Alto happen to be Palo Alto Networks.
Ok, so now the important part: How do we kill it?
Google Analytics Option #1: Advanced Segments
The easiest way, purely from a data analysis perspective, is by creating an advanced segment in Google Analytics that excludes the spamming service provider we just identified. For example:
This is easy enough, and will allow you to compare data without fear of compromised numbers. Apply the advanced segment and you can compare this month to the same month the previous year without those completely phony visits.
- You might be getting totally legitimate visits from that service provider! Very important to take a look at the traffic from this provider over time to see if this is a one-time spam hit, or an occasionally valid source.
- It’s just a band-aid for data analysis and doesn’t prevent the same problem from occurring again tomorrow.
Google Analytics Option #2: Exclude Filters
The way to actually make sure these visits don’t keep spamming your data is through an exclude filter at the analytics profile level. There are a number of different filters that could work, with varying degrees of problems.
I’ll make a note here that when you’re working with profile filters, you want to be very careful to keep a clean source of website data. You should have one unfiltered, untouched GA profile at all times. Create a new profile, or use an existing ‘test profile’ for the changes that follow. This way, if something goes wrong, you still have all your data perfectly safe elsewhere.
- Predefined Filter: Remember when we looked for the Domain creating this traffic? If you found a unique domain, you can use that now in a filter.
Warning! – Much like the advanced segment above, it’s important to consider here what traffic from this domain has looked like over time. If it’s a generic hosting domain name and you get a few legitimate visits from it, this is probably not the best answer.
- Predefined Filter: You can also exclude ‘Traffic from the IP addresses” pretty easily. Hopefully you’ve already done this for your home and work IP addresses (if not, there’s not time like the present!), and you’ll execute this the same way.
Warning! – There’s a more complicated way to verify that these IP addresses from DNS Stuff are without a shadow of a doubt the IP’s you want to exclude. It involves server logs and a serious deep dive into your site. I’ll get into that later, for the sake of everyone’s sanity.
- Custom Filter: If you aren’t able to use either of the above two options, there are still some custom filters you can apply. Unfortunately you won’t be able to exclude easily by service provider name. You can, on the other hand, easily exclude an entire city.
Warning! – The caveat here is pretty obvious and has already popped up; excluding an entire city isn’t exactly the cleanest way to deal with this problem. If you’re a global or national business site, or just a regular blog, there’s no reason you wouldn’t want to see perfectly legitimate visits from any city (unless you just hate Green Bay or something because SCREW YOU PACKERS!).
Those Bots Are Still Visiting Your Site, Even Now
After all that, there’s the disconcerting knowledge that those bots are still gaining access to your site! Just because you filtered them out of your analytics data doesn’t mean they aren’t still crawling all over your site, like a bunch of parasitic robot tapeworms.
Or at least, those filters don’t mean those bots couldn’t return to crawl your site at any time. In most examples of this I’ve seen, the spike of fake traffic occurs over the course of one (maybe two) day(s).
If your spike is continuous with no signs of slowing, or you want to be thorough and really make sure your website isn’t wasting valuable server time on these nonsense crawlers, you can block the spamming user-agent in your robots.txt file.
More Robots? What Am I, Will Smith?
I know, I know, this whole thing quickly spirals into the annoyingly tech-y realm of nerd that most website owners probably want to avoid. I mean, if you just want to write on your blog and maybe collect some adsense revenue, this whole process is a huge pain.
Speaking of adsense… if you’re looking for a compelling reason to go to all this trouble of discovering and blocking a spamming user-agent, adsense might be a good one. You know what those bots might be doing in addition to screwing with your Google Analytics visits?
Boosting the ever-living daylights out of your ad impressions.
Without going into too much detail about Adsense, if Google thinks you are trying to manipulate your Ad revenue through a surge of automated bots? I can’t imagine they’ll respond kindly.
That said, since all of the server log and robots.txt analysis goes well beyond the scope of cleaning up your Google Analytics data, I’ll have to provide a more detailed companion piece another time.
For now, I hope this helps answer a lot of your questions about cleaning up your Google Analytics after a spam bot spike.
If you have any questions, or suggestions, or even just plain old disgust with something stupid I’ve said, let me hear about it in the comments! (Or feel free to mail the disgust to my wife and let her revel in the power of the multitude.)