Fake visits bite.
You’re sitting there in front of your computer, all set to analyze another day’s worth of Google Analytics traffic, and BOOM!, tough actin’ Tinactin! Or wait… Boom! there’s a huge surge of traffic! How exciting!
At least, it’s very exciting until you engage in your usual breakdown of traffic and realize this sudden windfall of good fortune – unlike any sort of traffic numbers you’re used to seeing – is not at all what it seems.
The visits are all direct and (not set). The average time on site, percent of new visitors, and bounce rate figures are appalling. In truth, this apparent great day for your website might be the single most confusing thing to ever happen to your analytics data.
How do you break down what happened and prevent it from continuing?
Table of Contents
Identifying the Infiltrating Spam Bots
Analyzing Direct Traffic for Spam
Where are your Bots coming from?
Identifying the Infiltrator
In general, there’s an assumption that all Google Analytics data is from legitimate human visitors around the globe. On occasion, though, spam bots can sneak through Google’s barriers.
It’s important when you see an anomalous surge like the above to ask yourself: Have I done anything on the site recently that justifies this surge?
It’s very possible that you have. Maybe it was an infographic you spent months on, or something as simple as an article retweeted by a celebrity or major corporation.
To validate the origin of these visits, my first recommendation is to view the ‘All Traffic’ section of your analytics (Traffic Sources –> Sources –> All Traffic).
From here, take a look at your top traffic sources and click on each to see where your surge is coming from. Again, what you want to do is identify the anomaly compared to your site’s previous data.
If it turns out that the spike in traffic is through a referral source like Twitter, or a major spike in organic traffic through a newly established keyword set, great – You’re done and congrats!
If it’s an impossibly high spike in direct traffic, though? Looks like your Google Analytics got spammed. This is going to require some more digging.
Direct Traffic Spam Analysis
At first, this sort of direct traffic surge will make no sense. You’re not alone in the confusion.
You mean to tell me that hundreds of new visitors suddenly decided to type my URL directly into their browser? Or visit my site from a saved bookmark or RSS feed? What the H?
It’s a weird one, and 99% of the time it’s because this direct traffic is total BS.
The first thing I would do is narrow the data down to the day the spike occurred If the surge has been going on for some time, start with the first day you noticed it.
From there, go ahead and select ‘Landing Pages’ as your secondary dimension. This will let you see if this is a sitewide bot attack, or just a select few pages.
I’ve highlighted the most telling stats above. As you can see, these direct ‘visitors’ are hitting one page, immediately bouncing, and almost always registering as new visitors. Tell-tale bot behavior if I’ve ever seen it.
At this point, you should be about 90% sure your analytics got bot’d. There are a few more things we can deduce about this spam within GA.
Where Do The Bots Live?
One of the easiest ways to re-confirm the spam bot suspicion is by analyzing the location of this surge in traffic.
The simplest way to do this is to change the secondary dimension on your one-day direct traffic spike data to ‘City.’ If the vast majority of your shady looking direct visits are coming from the same city, guess what? Bot attack!
Personally, I prefer to analyze this data a little more visually. You can do this in analytics by clicking ‘Audience –> Demographics –> Location.’
Start clicking through the locations that contain the most visits. In my case, clicking ‘United States’ and then ‘California’ (which had about 90% of the day’s traffic) led me to the following insight.
What a Valentine’s Day treat – 77% of all my traffic arriving from Palo Alto, California! And look at the rest of those metrics. I won’t pretend my site is the world’s most revered, but a 99.42% bounce rate is a little strong.
And why the sudden interest from little old Palo Alto? Either my guerrilla marketing campaign in a city I’ve never thought about paid off, or… BOTS!
One quick important consideration here that will help you determine how to move forward: Does your site regularly get visits from the city in question? Expand your date range and see what kind of traffic you regularly get from this area. If you’re running a local website and the answer is “none, ever, and I don’t want anymore” it will make your solution significantly easier.
Bonus Bot Analysis
At this point, we should really be talking about solutions for dealing with this fake BS surge in traffic. But I have one more bit of Google Analytics diving that can offer some information.
Right now, we know that the website is recording bogus visits and we know where they’re coming from. What we don’t know yet is the who.
The first test here is to check out what domain all these visits are coming from. There are a handful of ways to do this, but the simplest is from the location analysis we just did. Add ‘Domain’ as your secondary dimension to all those visits from the single location in question.
If you get something other than ‘unknown.com,’ Google that thing and figure out what the source is! For example, a while back I noticed a client getting their surge from aws.com. A little bit of digging, and I found that aws could stand for Amazon Web Services (or Analytics Web Spammer).
In most instances I’ve seen of nonsense visits, the culprit is some sort of hosting/web services software.
If you do get the dreaded ‘unknown.com,’ there’s one more section to check: Audience –> Technology –> Networks.
From the graph above you’ll likely be able to verify the network generating this surge. In my example, the connection is abundantly clear – My friends in Palo Alto happen to be Palo Alto Networks.
Kill the Bogus Bot Visits!
Ok, so now the important part: How do we kill it?
Google Analytics Option #1: Advanced Segments
The easiest way, purely from a data analysis perspective, is by creating an advanced segment in Google Analytics that excludes the spamming service provider we just identified. For example:
This is easy enough, and will allow you to compare data without fear of compromised numbers. Apply the advanced segment and you can compare this month to the same month the previous year without those completely phony visits.
Problems here:
- You might be getting totally legitimate visits from that service provider! Very important to take a look at the traffic from this provider over time to see if this is a one-time spam hit, or an occasionally valid source.
- It’s just a band-aid for data analysis and doesn’t prevent the same problem from occurring again tomorrow.
Google Analytics Option #2: Exclude Filters
The way to actually make sure these visits don’t keep spamming your data is through an exclude filter at the analytics profile level. There are a number of different filters that could work, with varying degrees of problems.
I’ll make a note here that when you’re working with profile filters, you want to be very careful to keep a clean source of website data. You should have one unfiltered, untouched GA profile at all times. Create a new profile, or use an existing ‘test profile’ for the changes that follow. This way, if something goes wrong, you still have all your data perfectly safe elsewhere.
-
- Predefined Filter: Remember when we looked for the Domain creating this traffic? If you found a unique domain, you can use that now in a filter.
Warning! – Much like the advanced segment above, it’s important to consider here what traffic from this domain has looked like over time. If it’s a generic hosting domain name and you get a few legitimate visits from it, this is probably not the best answer.
-
- Predefined Filter: You can also exclude ‘Traffic from the IP addresses” pretty easily. Hopefully you’ve already done this for your home and work IP addresses (if not, there’s not time like the present!), and you’ll execute this the same way.
Warning! – There’s a more complicated way to verify that these IP addresses from DNS Stuff are without a shadow of a doubt the IP’s you want to exclude. It involves server logs and a serious deep dive into your site. I’ll get into that later, for the sake of everyone’s sanity.
-
- Custom Filter: If you aren’t able to use either of the above two options, there are still some custom filters you can apply. Unfortunately you won’t be able to exclude easily by service provider name. You can, on the other hand, easily exclude an entire city.
Warning! – The caveat here is pretty obvious and has already popped up; excluding an entire city isn’t exactly the cleanest way to deal with this problem. If you’re a global or national business site, or just a regular blog, there’s no reason you wouldn’t want to see perfectly legitimate visits from any city (unless you just hate Green Bay or something because SCREW YOU PACKERS!).
Those Bots Are Still Visiting Your Site, Even Now
After all that, there’s the disconcerting knowledge that those bots are still gaining access to your site! Just because you filtered them out of your analytics data doesn’t mean they aren’t still crawling all over your site, like a bunch of parasitic robot tapeworms.
Or at least, those filters don’t mean those bots couldn’t return to crawl your site at any time. In most examples of this I’ve seen, the spike of fake traffic occurs over the course of one (maybe two) day(s).
If your spike is continuous with no signs of slowing, or you want to be thorough and really make sure your website isn’t wasting valuable server time on these nonsense crawlers, you can block the spamming user-agent in your robots.txt file.
More Robots? What Am I, Will Smith?
I know, I know, this whole thing quickly spirals into the annoyingly tech-y realm of nerd that most website owners probably want to avoid. I mean, if you just want to write on your blog and maybe collect some adsense revenue, this whole process is a huge pain.
Speaking of adsense… if you’re looking for a compelling reason to go to all this trouble of discovering and blocking a spamming user-agent, adsense might be a good one. You know what those bots might be doing in addition to screwing with your Google Analytics visits?
Boosting the ever-living daylights out of your ad impressions.
Without going into too much detail about Adsense, if Google thinks you are trying to manipulate your Ad revenue through a surge of automated bots? I can’t imagine they’ll respond kindly.
That said, since all of the server log and robots.txt analysis goes well beyond the scope of cleaning up your Google Analytics data, I’ll have to provide a more detailed companion piece another time.
For now, I hope this helps answer a lot of your questions about cleaning up your Google Analytics after a spam bot spike.
If you have any questions, or suggestions, or even just plain old disgust with something stupid I’ve said, let me hear about it in the comments! (Or feel free to mail the disgust to my wife and let her revel in the power of the multitude.)
Great tips Dave. Thanks! Quick question – if a site is hosted on Shopify, could a bot attack of this sort cause any potential damage to the server or any other issues that we should be aware of? We’re also running AdWords for this client, but not AdSense. Any concerns there?
We’ve opted for option #1 and so far, it’s filtering out that fake data nicely. I’m just on high alert now, trying to make sure that there are no potential threats to my client’s site.
Thanks!
Glad this helped some, and thanks for the comment!
Generally speaking, analytics spam like this does not do any damage to the server. The intent from most of the spammers I’ve seen seems to be data collection, with the GA spike occurring as an unintended consequence.
BUT, not all bots are created equal, and obviously malicious spam can be a real concern. If you can deduce where the bots are coming from, and then potentially match an IP to your server logs (a lot easier said than done), that should help put your mind at ease. I’m far from an expert on site security, but for me, putting a name with the spam attack helps put my mind at ease (i.e. this is a reputable security firm in California, as opposed to a single individual injecting hidden links from Egypt… nothing against Egypt).
As far as the AdWords concern, it’s less likely to be a problem than AdSense. If you have any PPC specific landing pages (unique URL parameters, etc), you could check if these are included in the direct traffic surge. My guess would be they are not. The spammers I’ve seen just go page by page through your sitemap.
End of the day, this is probably just a fluky/annoying problem within GA. If you really want to play it safe, I found this post very helpful in terms of managing scams.
All the best,
Dave
Dave, thanks so much for this article. A website of mine just got 141 hits from “palo alto networks” just like you mentioned. Then I decided to do some research and found you.
Much appreciated! This definitely sets the record straight.
Thanks for the comment, Tony – really glad it helped! Definitely a confusing situation, but fortunately one that can be managed.
Hi Dave, just been looking at this very scenario, and found your article a useful support guide. Having made the mistake once before, I +1 your point about keeping a full, clean, unfiltered GA profile!
Hi Blaine, glad it helped! And yes, that unfiltered data is a life saver once those Mad Scientist filters get out of hand 🙂
Super interesting article! I’ve been getting a surge in fake hits lately and it really helped me to weed them out. Thanks again (:
Well, guess what’s in/near Palo Alto? Google HQ. But I don’t see a reason why Google would be looking at your website… yeah, it is probably bots.
First time visiting, great information. I sometime do feel like Will Smith(lol). Seem like when one is corrected he goes and get his other friends and just do a number on my site.
I have truly took a lot of notes from your advice and will apply them TODAY. Thanks for the help!
Glad you found the post useful. Thanks for the comment!
Hi Dave,
Thanks for confirming that this is a problem!
I work for a large company with 1000+ of sites.
We are seeing an increasing amount of what we expect is bot traffic from various hosting providers – aws (amazon) being one of them.
We have the issue of applying many filters to multiple profiles per account – not very scalable. Imagine having to create this filter for 2 profiles for 1000+ sites.
Would love to hear if anyone else has this issue?
HI Tony,
We are seeing this issue. I’d love to talk. We found, for January alone – 15 different IP addrs that are causing 1.5M page views from just a few different pages. The same page is often hit 1x/sec for hours.
At the same time, we can see that some of these are real users.
Mark
Thanks for the helpful post!
I was wondering how we could permanently block the direct fake traffic to our websites. As an SEO expert, I have a few discussions with other online marketing professionals and they are of the view that Google now consider website’s user experience as one of the ranking factors.
Thousands of visitors with 100% bounce rate and 0 sec avg. time on website leaves a bad impression on the website.
Any help / suggestion would be highly appreciated!
Also, In an effort to stop the surge coming on my clients website, we anlayzed Google Analytics and figured out that 630 visitors came from network domain: amazonaws.com with 0 sec on website and 100% bounce rate.
If we permanently block this domain (amazonaws.com) to send traffic, will it also stop letting original visitors to access the website?
In an effort to start controlling fake traffic we have added the following major spammers to .htaccess file.
##begin code
##start blocking potentially unwanted bots.
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* – [F,L]
##end code. bai bots.
Hi Haris and David
We have the same situation here. So what can we do to avoid amazonaws.com or unknown.unknown to enter like this on our website?
Other question: Is is it affecting our SEO considering it increasing our Bounce Rate? tks
I’d defer to the rewrite engine example above on actually preventing these bots from accessing your site. It’s a very tricky beast.
The bounce rate question is an interesting one. Google Webspam is on record as saying GA data does not impact rankings: http://www.youtube.com/watch?v=CgBw9tbAQhU
My thinking would be that because these bots aren’t accessing your site organically, they don’t register as organic bounces, and should not harm your SEO visibility.
We have used the following code in .htaccess file to stop direct fake traffic from amazonaws.com along with the spam bots.
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://.*amazonaws\.com [OR]
RewriteCond %{REMOTE_HOST} ^.*\.compute-1\.amazonaws\.com$ [NC,OR]
RewriteCond %{REMOTE_HOST} ^http://.*amazonaws\.com [OR]
RewriteCond %{REMOTE_HOST} ^www\.amazonaws\.com [NC]
RewriteCond %{HTTP_USER_AGENT} “AISearchBot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “woriobot” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “heritrix” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “NetSeer” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “Nutch” [NC]
RewriteRule ^(.*)$ – [F]
Order Deny,Allow
Deny from amazonaws.com
Deny from hosting.com
Deny from 72.21.206.80
Deny from 72.21.210.29
Deny from 207.171.166.22
Deny from 67.59.185.59
Hello Dave,
Thanks for your article. It is good to share with people with the same problems. Those are very frustrating and the most frustrating is that, even though we can exclude this traffic, we will never know its real origin!
Yesterday the website got 1,077 direct visitors from US on one internal page of the website!
Network Domain: msn.om
Hi Dave,
Much like everyone in here, I got a sudden surge in traffic from Bellingham WA where around 15-20 users would constantly browse different pages of my site. After following your instructions, I was able to narrow down the domain name to one single domain and guess what it was..msn.com :-/
Not sure if I should filter this domain out or let it be since its msn.com and bing might be crawling my site. Any suggestions?
Hey Mitul,
Well mostly it’s up to you. I’d clarify that filtering msn.com out of your google analytics results won’t prevent Bing from actually crawling your site. It just keeps those apparent “visits” from populating your GA data.
So if you don’t like seeing those fake visits, filter away! Your site will still get crawled just fine.
Yesterday we received 2500 visits with a 100% bounce rate and 0 seconds time on site all from MSN.com.
I was wondering the same thing as Mitul, blocking MSN doesn’t seem to be a viable option. What other solutions can be used?
Hey mega thanks for this post dave!
Just search around and find this post, what i wanna ask is if i got some fake bot visit per day, is this will do a harm to my adsense? I mean, if this count for every unique IP, does adsense will punish me about this? I think many people confronted this as me if you use WordPress, when you publish a new post, you will 100% get many fake bot comments, right?
i search a lot but find nothing about this, of course you helped me a lot and cleared my thought a lot, wish you can answer this question, dave!
Maybe adsense will do nothing just because this is so common?!?!
My thinking is that Adsense and Google are savvy enough to discern between most spamming bot impressions. I’m sure there are exceptions to this, but for the most part I don’t hear much about real problems with Adsense. If the spamming really gets out of hand, it seems like a possibility, but generally speaking you should be safe.
Hello and thanks for the article. We manage 2 sites here and both have seen a massive increase in visits over the last month from Bellingham WA, from msn.com. We don’t serve the USA at all so filtering out the city of Bellingham worked for our analytics. Any idea what is going on in Bellingham, who is running bots from there and why?
I believe that’s Microsoft (Bing).
Thanks Dave.
We’ve seen an increase of apparent bot traffic from network domain amazonaws.com in the past 6 months. We first identified the traffic because the country appears as “not set”. In June “not set” accounted for .23% of traffic, rising to 6.5% in October. It looks like 97% of that traffic is by Firefox on Linux OS.
Unfortunately the traffic is daily and it increased gradually, so was not discovered until now.
– 100% of the visits are new
– the bounce rate is 99%
– the pages/visit is only 1
– the average visit duration is less than a second.
Does this match the signature of bot traffic on your sites?
Thanks,
Stephen
That matches, yes. The Firefox * Linux OS analysis is interesting, nice find.
you can block the spamming user-agent in your robots.txt file.
How does one do this in Blogger? Getting inundated with russian bots and today “porn” URLs started popping up. I hate that I am so pathetic at knowing what to do.
Thanks
Hi Deb,
I’m far from an expert on Blogger, but if your site is self-hosted, you’ll be able to access your robots.txt file through your FTP/hosting platform. You may need to create a robots.txt file. The Google guide is here. I’d note from that guide that some malicious spammers may ignore a robots file (the jerks), in which case this is a security issue that will require a more technical understanding of site security (one I don’t have).
Hope that’s useful, and good luck with the Russians.
~Dave
Excuse my inexperience on these matters but I run a magazine-type news-based website on a specific industry sector. On average, about 5 items are added per day.
Three days ago, a relatively low-grade (in terms of newsworthiness) news item started attracting unusually large numbers of visitors almost immediately after it was posted.
Oddly, it appears to have been just this one item with no one city of origination.
Also, while paying careful attention to the Real Time activity in googlestats, I watched the particular item being viewed from a number of specific cities. Some of these cities now do not show in the stats for this particular news item.
Great post. This was super helpful. I found 3 networks that are the main cause of my huge spike in traffic and they appear to be bots. I’m trying to set up an advanced segment to filter out those networks, but I don’t see an option for filtering out Networks within Advanced Segments. Could you provide some insight into where I can find that? Thanks.
Thanks, Alyssa, glad this was useful.
If you head into create a new segment, you can then click “Sequences.” From here, you’ll find Network Domain in the “Visitors” section. Good luck!
Hi Dave, thanks for the great post! I found the network ‘supernova s.r.l. – cloud services’ is spamming our website. Is there any way to filter out visitors from a network, using option #2? I can’t seem to find it amongst the filter options.
Thanks!
Jeroen
Thanks Dave,
This explains quite a lot of my issue.
In our case Google Analytics shows that that “amazonaws.com” (Amazon Web Services) is the service provider for a large number of direct visits.
It would appear that the direct traffic started when our web platform Adobe Business Catalyst moved its hosting to Amazon in 2013. It has been about 900+ visitors per month however over the past 2 months it has dropped significantly to 30/mth.
In looking at Google Webmaster Tools there is no issues of security.
I would be interested in determining why Amazon Web Service would be visiting our site and why the sudden drop – have asked Adobe BC for an explanation.
Many thanks!
Useful post, thanks.
I was left scratching my head a bit, as our traffic spike looked like a bot attack, but it is not coming from one source.
It is all registering as direct traffic and is all bouncing, but it is coming from a wide range of US cities and networks.
I wonder if this is coming from infected or hacked PCs.
Hey Dave,
Thanks for providing this detailed understanding of sudden spike in traffic. I was also digging down this issue and found my alexa rank has dropped from 9k to 30k following this spike.
Putting filters or segments in analytics wont stop bots to crawl my website, if i use robot.txt or change in .htaccess files would this will help me regain my alexa ranking.
Looking forward to hear from you.
Thank You
Hi Dave,
This was very helpful. Thanks for sharing your findings!
We’ve had a slightly different experience with the amazonaws.com inspired direct traffic. There’s been a spike in direct traffic for three client sites beginning around May 29.
* Dramatic direct traffic increases coming from Ashburn, Virginia – or, Washington DC (Hagerstown MD) metro area. (The numbers to the site are virtually the same for each.)
* The browser in these cases for all three sites affected is Chrome 21.0.1180.83.
* The Network Service Provider behind all of these visits is amazon technologies inc for all three (Network Domain Name is amazonaws.com).
* The traffic is coming from desktop computers (vs mobile).
* Bounce rate is 0%, yet average session is 11-12 seconds for all three sites for the direct visits from Ashburn.
This is only happening to client sites that are using the same VPS hosting provider. We looked at other clients using different hosts and they are not experiencing the same issue… Next stop, contacting the hosting provider and looking at server logs.
Two of the clients are local-only bricks and mortar, so segmenting out the visits from Ashburn will help in Analytics, but I DO want to block our little friend from Ashburn on the server side!
Hello Tracy,
same bot just visited our website.
I will prepare new segment in GA. How did you solved trhis issue?
Thanks and regards,
Vaclav
I too get a heck lot of spam traffic according to the web statistics plugin I have installed for my website. Fortunately, Google Analytics is showing me only real traffic which is quite lesser, I must say.
Great article with exact steps on how to drill down to locate the bot activity and how to filter to show real world traffic.
A couple of places in your article sort of threw me and wanted to clarify. From the wording it was almost like GA was the target and not the website.
” On occasion, though, spam bots can sneak through Google’s barriers. ”
What Google barriers? The bots were pointed at the website itself right? Google does not have an option to prevent bots from the site itself just the ability to filter the results after the fact. Is this correct or am I missing something within GA dashboard?
” Looks like your Google Analytics got spammed. ”
Again GA is just a reporting tool of the websites actual traffic. It was the website that got spammed not GA?
I don’t mean to be petty just want to make sure I’m not missing something fundamental. Like there is a set of functions within GA I’m unaware of. Or that the bots are actually targeting the Analytics instead of the actual website.
Thanks again for a very helpful article.
Cheers!
You can’t block anything with a robots.txt file. That is just to tell good spiders where not to go. Bad bots will take no notice of it.
You need to block the useragents through a htaccess file (apache only) or in your server config.
Thanks for taking the time to write this article. I know I’m late to the party, but I’m experiencing something similar.
I’ve been getting a huge surge in direct “not set” traffic for over 12 hours. The locations are spread evenly throughout the U.S. and the bounce rate is low. I’ve also had several people sign up for my newsletter.
I’ve only been able to narrow it down to the fact that most of the traffic is coming from the browser Safar (in app).
Any idea what this means and what I should do? Thanks in advance!
Update from my last comment:
The people who opted in are real people. I found LinkedIn profiles that match the employer emails they used. (I’ve only checked a few)
Thanks for the info, but can you tell us what to click on in order to get to exclude filters? It is not easy for the do-it-yourself-er to figure any of these things out and google analytics doesn’t have very easy directions. Thanks
A big thank you Dave for this thread.
I too was shocked and then confused when I saw a 300% BOOST in traffic on a site I’m building for my brothers local business. … yippie!!! thinking those last few tweaks made a difference. LOL
Since my brothers’ biz is local covering 3 cities, I migrated to GA to find out where to target some more of my awesome traffic boosting prowess…. yea I’m good at this stuff. I got a 300 % boost in traffic from Moscow. At first I thought… dang I’ve been here in FL for 10 years and didn’t know that Moscow FL is in my back yard… must close too.
I was infuriated to find Moscow was in Russia and why in the hell are they looking at a Lawn Service company …. they don’t even have grass.
I may use your info here to filter them, if it continues to grow. At this point I don’t know what damage it’s doing. I’m certain Google knows this, and will take this into consideration, as it is maybe skewing some numbers.
Thanks Dave great info great Blog!
Glad you found it helpful!
Hi Dave,
Super helpful post–really thorough analysis.
My question: Aside from making GA data a bit misleading, are there any other problems that these kinds of spam bots are causing? You mention a waste of server resources–are there any other issues that could come from this (particularly from an SEO perspective)?
Again, thanks for a really informative article.
It’s a good question and not one I have a definitive answer to. A lot of it depends on how search engine algorithms register these bots. If the bots are bouncing nearly 100% of the time, that could – potentially – look like users are not finding what they’re looking for. But then the bot might not be bouncing back to google and entering a new search query, or clicking another result in a SERP, so it’s possible it just looks like a referral campaign isn’t keeping users around.
It’s a complex issue, but my biggest concern is ad fraud. I.e. bots clicking affiliate links, Adsense, and throwing those systems out of whack. For SEO the biggest problem I’ve seen is in the data.
Why does it happen? Why these spam visitors? I can’t understand the reason…what do they earn from “visiting” websites? I created my website and after a While I started to receive “Visits from Darodar, Iloveitaly and now blackthatworth!!! I can’t understand their reason…I put the filter on Analytics…
This is the magic question. It’s frustrating because it outwardly seems to serve no purpose. I’m putting together some thoughts for a follow up post!
Hi Dave!
Very useful post, thank you!
I discovered a huge spike in traffic to my site, starting this weekend and growing daily. It is all coming from one of our affiliaties, but I was worried he was using bots or some other dodgy method to do this.. Having analysed the visits now though, Im unsure what to think!
Our bounce rate has stayed low – and looking at these visits in isolation, their bounce rate too, has been OK. In addition, there is not one, or two, locations the visitors are coming from, but they are scattered all over. Many in the US, but no obvious concentration in state or city.
At the same time as this has occurred, our visibility in google has gone up rapidly (after a dip following a restructure of our site)
Can I assume the traffic is actually genuine? It seems too good to be true – or maybe this guy (who I cannot get hold of either) knows something I don’t. I’m puzzled over this.
If you have any thoughts, any input it would be greatly appreciated, I’m a bit of a newbie in SEO and all!
Hi Dave,
Thank you for the great article!
Now I understand why I have so much traffic from Moscow
although my website is in English and my second one is in Japanese.
Thanks again!
Does anyone know what Google is doing against this?
Hello,
Thanks for the article. We recently started getting a ton of spam referrals from ilovevitaly and similar websites. They all come from Samara, Russia.
We blocked their IP adress (the one we could locate) in wordpress, also filtered some of them out in GA. However, we are still getting these spam referrals.
Does anyone know if there is something that can be done in wordpress to permanently remove all referrals from specific city or country? I’d rather block the whole country than have this issue continue..
Thanks for any help!!
Lana
Me too! Thanks for the great post, even I could understand it.
My GA is a mess with the bulk of traffic coming to my sites being reported as Brazilian. I applied a filter to block all traffic from Brazil, hopefully that will make the analytics more useful.
Is there a way to track what they do or are looking for on your site? I’m not as much troubled as curious…
Looking forward to more posts on this subject and related topics.
Alas! Thanks for this detailed article. I’ve been traying to figure out how to block those spammy traffic referrers. I wonder though, would this work if you block Google? Just a thought 🙂
I’ve just had the same thing happen to me. I did delete a secondary menu and moved the content to the primary menu. 100 pageviews in minutes and I know my traffic is not that good, must have been changing menu’s???
I got tons of traffic to all my sites using GA. All from Russia, now I excluded all traffic from Russia. Thanks for the great post!
If you have an ECommerce site. Watch out for FRAUD orders and stolen credit cards. Immediately!!!!! We got hit 5 days after with stolen credit cards.
A great overview when it comes to bots who hit your website. However, with Google Analytics and other similar software you are also prone to a much worse attack: one that doesn’t even hit your servers!
I’ve gone into some detail on the topic here: http://blog.analytics-toolkit.com/2015/google-analytics-data-integrity-attacks/
No known antidote against this, unfortunately…
Hi Dave,
In AWSTATS, under visit duration, would a bot or spider cause the visit duration to be 30 minutes, to over an hour?
If this repeats, say an average of ten times every month, for over a year, are these likely to be real people visiting and not just bots? There’s a great deal of reading material on my sight and I’m wondering if I can trust the visit duration feature, even if I can’t trust the actual ‘visitors’.
Thanks for your help.
My Dear Boy!
An EXCELLENT treatise of a nearly infuriating subject, Inspector LeStrade would be proud!
Now, I’m off to rid myself of all traffic from *.ru that confounds my writings beyond compare!
SH
Thanks Dave! This post just saved my weekend!
Very glad I could help!
I am going to work through this guide, I am getting traffic from the usa, 1 page view, 100 percent bounce to my site
http://www.youniqueestelle.com
It looks like I’m having this problem too. More than 500 direct hits and 439 visitors from the US, location (not set). I’m a little confused, though, what exactly do i need to filter out?
If you’re using a WordPress site and don’t want to play with the .htaccess file, try the plugin I just discovered. I’m not affiliated with this at all – just think it’s something that might help other WP bloggers.
https://wordpress.org/plugins/gm-block-bots/
thanks i am trying it and hope for the best
Thanks Dave, Well written. My Analytics screen didn’t quite match up, but that’s probably because google “improved” it :). Great information though. I’m just now starting to track multiple websites in Analytics and getting the data aggregated into useful and actionable data is time consuming. I saw some spikes and started looking for an explanation, that’s what brought me to your site. Found it about 6 articles down on the first results page. Great Job!
Hi
I discovered that my site has been targeted by bots for a long time. I read through your article but it is not feasible to block ip’s and domains one at a time because there are a whole lot of them so i need a way to block all of it.
I found a neat trick that worked though. If you create a segmented list and specify that the session must be greater the 1 sec. then it completely eliminates all the fake direct traffic. My bounce rate then went from over 67%% to about 6%, 2/3rds of my traffic disappeared (depressing) and my average session time shot up from 2mins to over 6mins!
Obviously thew bots are still crawling about though but do you think there is any problem with my simple method?
I hate to rain on the parade, but the catch here is that a visit that bounces will not register any average time on site. GA requires a user to view a second page to calculate session time. As a result, the bounce rate decline you mentioned makes a lot of sense – although interesting that it’s still 6% 🙂
So you’ll likely weed out direct bots, but the downside is you’re probably also segmenting out real people.
Thanks for article but still i didn’t find the proper solution for direct spam traffic i Tried all the thing from .htaccess blocking google analytics filter but still the spam traffice has not stop.
I am going to work through this guide, I am getting traffic from the usa, 1 page view, 100 percent bounce to my site
Thanks for the helpful post! We using .htaccess its 100% success.
Our Android app also sees loads so traffic from Boardman.
We have tried filtering it out but this filter broke the entire analytics.
So one possible solution is to try switching off Segment https://segment.com/docs/integrations/google-analytics/
Is everyone else who has Boardman users also using Segment?
Yes – I get constant traffic from Boardman. What the hell’s in Boardman?
I’m getting the same thing.
Have either of you paid for web traffic?
I started noticing this surge from Boardman within hours of hiring someone to send targeted traffic to my site.
Just seeing this thread..anyone figure out this? And to answer the Segment question yes we do use it…we are getting tons of traffic to a giveaway contest from Boardman but doesnt look like any registrations..def looks like a bot to me.
OMG Boardman!!! Ashburn!!! I’ve had almost 30000 visits from Boardman in the last 6 weeks. Boardman is a tiny town in Oregon, population 3220 people …. and they’re all coming to my site almost 8x each for less that a second!!!! HEEEEELLLLPPPPP!!!! I’m being overrun by Boardman men! Who are these people!!
I tracked it down to an IP 35.161.29.125 coming from an Amazon AWS server. Most certainly spam or scraper
I’m having the same issues with the site (New site) i’m working for (spike in Direct traffic + 0:00:00 session duration) however my other research reveal that Direct traffic spike doesn’t necessarily mean Bots and 0:00:00 sessions could be people visiting and leaving after first page visit. Is this true?
I followed your steps until Technology->Networks and “Not Set” has highest sessions.
What could be more possible for me — a bot or just normal traffic from wrongly categorized traffic (direct instead of organic)?
Hope you could help. Thanks!
Thats NEWS Site btw
Nice read and very interesting for me! We’ve released an anti-spam tool recently to tackle “Google Analytics Spam”. It’s a free tool which you can find here https://www.adwordsrobot.com/en/tools/ga-referrer-spam-killer. Maybe this could help you guys to easy defend against this kind of spam.
I run 3 websites for small Australian groups and would never have expected a visitor from Brazil. My stats counters shows over 10% of visitors to these sites are from Brazil, and from all states and consistently ie no spikes. There’s a link to the counters on all the index pages.
Are Brazilians all just surfing the internet? Any ideas?
Thanks for the sharing.
You can also use wordpress plugin named : Bot Block if you are using wordpress for your blog. Plugin link- https://wordpress.org/plugins/bot-block-stop-spam-google-analytics-referrals/
Hi,
I recently started getting traffic from Ashburn and though it’s not increasing but still do you think I should put filter or block this?
I’m really frustrated with these spam bots now. They have not left any stone unturned. From every medium this spam traffic has been increasing day by day.
Anybody have any permanent solution to this??
Thanks,
Mark B
Great article Dave, thanks, Initially I was happy that my website is being watched globally and many people at once from “Palo Alto” are browsing my site, however I was curious to know why so much is traffic coming from this particular city and got this life saving article. Thanks a ton, I added a few filters and now the traffic is normal as it should be. Thanks again.
Hi!
Thanks for the post, it was very helpful, but I wanted to ask if it is possible for a bot to look like it is coming from multiple locations and service providers? My client’s website got hit with a surge of 3,800 direct visitors, all from China, however when I view the ‘cities’ metric, the traffic is divided up amongst about 20 different cities in China. When I view the service provider, the traffic is again divided up amongst a dozen different Chinese providers – they all average around 200 – 300 hits each, there is no single city or provider that stands significantly above the rest. The client hasn’t had any advertising or anything happen in China, and they are a humble bricklaying company so not likely that they’d be making the news for anything over there either. I just wanted to ask if anyone has ever heard of anything like this happening and if it is likely to still be bot activity?
Hey, nice article you’ve written there Dave!
On my blog, I have a spam comment problem. Apparently, in every 20-30 minutes, a spam comment is made on any one of my blog posts (it is mostly only in one blog post). It contains keywords, website links, flattery, fake genuine questions, etc. Akismet is smart and it detects almost each and every one of these shitty spam comments, and so, I do not have to worry about that. I get to read only real comments and approve them. 🙂
But the problem is, recently I started using Google Analytics, and it constantly shows my bounce rate as around 75% to 80%. I am pretty sure this is because of all those spam comment bots.
Even if I close the comments or introduce high-level captcha, the spam comment bots still visit the website, thereby ruining my Google Analytics data.
And the mind-blowing thing is that these comments are not from a single IP address or even similar IP addresses for that matter. No, sir. They are all from different IP addresses. How could that guy (the genius mind behind this auto-spammer or auto-backlink creator) even manage to do that?!
I would love it if anyone could give me some tips relating to this matter.
Thank you.
Great article Dave, thanks, Initially I was happy that my website is being watched globally and many people at once from “Palo Alto” are browsing my site, however I was curious to know why so much is traffic coming from this particular city and got this life saving article. Thanks
I really appreciate this write up. I had an experience the other day where my new site had a surge in visitors. I was so excited, then I realized it was spam not activity after reading this article. The funny thing is, this all happened after I made a post about malware and botnets. I think people are being asses. Now I am doing what I can to block them, but I know they have as many options as I do. 🙁
Hi, my question is, is every visit from Russia or Samara is a fake or bots? What if a real user visits blogs from russia(If anyone who resides at russia)? How to check whether it is a real or fake?
A few keywords will do the job here… don’t take yourself too seriously + bigger search engines (google, bing, yahoo) narrow the results by the location of the user.
If I search for a plummer. I will write something like this -> plummer services CITY_NAME… or something similar.
And everything will be just fine 🙂
Hi Dave
Thanks for this
These bots make work harder for me.
I have one particular problem with spam bots, they leave a big mess on my my bounce rates.
Neil patel recommended you, and i am happy i came here.