2 Quick Ways To Reduce Traffic On Your System
By Strictly-SoftwareSlowing The 3 Major SERP BOTs Down To Reduce Traffic
If you run a site with a lot of pages, good rankings, or a site that tweets out a lot e.g whenever a post comes online then you will probably get most of your traffic from getting crawled by the big 3 crawlers:
I know that whenever I check my access_log on my server to find out the top visiting IP addresses with a command like
grep "Jan/2015" access_log | sed 's/ - -.*//' | sort | uniq -c | sort -nr | less
I always find the top IP's are the main 3 Search Engines own BOTS (SERP = Search Engine Results Page), so I call their BOTS SERP BOTS.
GoogleBot: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Bing: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Yahoo: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
Without asking these BOTS to come NOW by doing things like refreshing your sitemap and pinging these SERPS.
Or tweeting out links then they will crawl your site at their own time and choosing and with nothing to tell them to slow down they will crawl at their own speed. This could be once every second and if so it could cause your site performance issues.
However thing about it logically, if you post a news article or a job advert then in reality it only needs to be crawled once by each SERP BOT for it to be indexed.
You don't really want it to be crawled every day and on every visit by these BOTS as the content HASN'T changed so there is really no need for the visit.
Now I don't know a way of telling a BOT to only crawl a page only if it's new content or it's changed in some way even if you had a sitemap system that only put in pages that were new or edited as the BOTS will still just vist your site and crawl it.
If you cannot add rel="nofollow" on internal links that point to duplicate content which doesn't actually 100% mean the BOT won't crawl it anyway then there are some things you can try if you find that your site is having performance problems or is under pressure from heavy loads.
Crawl-Delay
Now this only used to be supported by BingBOT and then some smaller new search engines like Blekko.
However in recent months after some testing I noticed that all most major SERP BOTS apart from GoogleBOT now obey the command. To get Google to reduce their crawl rate you can use Webmaster Tools to set their crawl rate from the control panel.
For instance on one of my big news sites I have a Crawl-Delay: 25 setting and when I check my access log for those user-agents there is a 25 second (roughly) delay between each request.
Therefore extending this value will reduce your traffic load by the major visitors to your site and is easily done by adding it to your Robot.txt file e.g.
On more detailed inspection of my custom logger/defence system that analyses the behaviour of visitors rather than just assuming that because your agent is IE 6 you are actually human could I see these visitors were all BOTS.
This Rewrite rule bans IE 5, 5.5 and IE 6.0 and sends the crawler back to the localhost on the users machine with a 302 rewrite rule.
No normal person would be using these agents. There maybe some Intranets using VBScript as a client side scripting language from the 90's but no modern site is designed with IE 6 in the designers mind. Therefore most sites you find will not hanlde IE 6 very well therefore like Netscape Navigator they are an old browser so don't worry about site support for it. Therefore by banning it you will find your traffic going down a lot by banning just IE 6 and below.
So two simple ideas to reduce your traffic load. Try them and see how much your site improves.
Crawl-delay: 25
Banning IE 6
Now there is no logical reason in the world for any REAL person to be using this user-agent.
This Browser was probably the worst ever Browser in history due to the quirks within it that made web developers jobs so hard. Even just between IE 5.5 and IE 7 there are so many differences with IE 6 and is the reason IE 8 and 9 had all the settings for compatibility modes and browser modes.
It is also the reason IE is going to scrap support for IE 7-9 because of all this hokerery pokery they introduced just to handle the massive differences between IE 6 and their new standard compliant browsers.
It is also the reason IE is going to scrap support for IE 7-9 because of all this hokerery pokery they introduced just to handle the massive differences between IE 6 and their new standard compliant browsers.
Anyone with a Windows computer nowadays should be on at least IE 10. Only if your still on XP and haven't done any Windows Updates since about 5 years ago would you be a real IE 6 user.
Yesterday at work I ran a report on the most used Browsers that day.
IE 6.0 came 4th!
It was below the 3 SERP BOTS I mentioned earlier and above the latest Chrome version.
On more detailed inspection of my custom logger/defence system that analyses the behaviour of visitors rather than just assuming that because your agent is IE 6 you are actually human could I see these visitors were all BOTS.
I check for things like whether they could run JavaScript by using JavaScript to log that they can in the same way as I do Flash. These users had no JavaScript or Flash support and the rate they went through pages was way too fast for a human controller.
The only reason I can think people are using this user-agent is because they are script kiddies who have downloaded an old crawling script and the default user-agent is IE 6 and they haven't changed it.
Either they don't have the skill or they are just lazy. However by banning all IE 6 visitors with a simple .htaccess rule like this you can reduce your traffic hugely.
RewriteRule %{HTTP_USER_AGENT} (MSIE\s6\.0|MSIE\s5\.0|MSIE\s5\.5) [NC] RewriteRule .* http://127.0.0.1 [L,R=302]
This Rewrite rule bans IE 5, 5.5 and IE 6.0 and sends the crawler back to the localhost on the users machine with a 302 rewrite rule.
No normal person would be using these agents. There maybe some Intranets using VBScript as a client side scripting language from the 90's but no modern site is designed with IE 6 in the designers mind. Therefore most sites you find will not hanlde IE 6 very well therefore like Netscape Navigator they are an old browser so don't worry about site support for it. Therefore by banning it you will find your traffic going down a lot by banning just IE 6 and below.
So two simple ideas to reduce your traffic load. Try them and see how much your site improves.