Sunday, 25 January 2015

Returning BAD BOTS to where they came from

Banning BAD BOTS to where they came from

By Strictly-Software

Recently in some articles I mentioned some .htaccess rules for returning "BAD BOTS" e,g crawlers you don't like such as IE 6 because no-one would be using it anymore and so on.

Now the rule I was using was suggested by a commenter in a previous article and it was to use the REMOTE_ADDRESS IP parameter to do this.

For example in a previous article (which I have now changed) about banning IE 5, 5.5 and IE 6, I originally suggested using this rule for banning all user-agents that were IE 5, 5.5 or IE 6.

RewriteRule %{HTTP_USER_AGENT} (MSIE\s6\.0|MSIE\s5\.0|MSIE\s5\.5) [NC]
RewriteRule .* http://%{REMOTE_ADDR} [L,R=301]

Now this rewrite rule uses the ISAPI parameter {REMOTE_ADDR} which holds the originating IP address from the HTTP request to send anyone with IE 6 or below back to it.

It is the IP address you would normally see in your servers access logs when someone visits.

Problems with this rule

Now when I changed the rules on one of my own sites to this rule and then started testing it at work for a work site by using a user-agent switcher add-on for Chrome I ran into the problem that every time I went to my own site I was sent back to my companies gateway router page.

I had turned the switcher off but for some reason either a bug in the plugin, a cookie or session variable must have caused my own site to believe I was still on IE 6 and not the latest Chrome version. So everytime I went to my site with this rule I was kicked back to my companies gateway routers page.

Therefore after a clean up and a think and talk with my server techie guy he told me I should be using localhost instead of the REMOTE_ADDR IP address .The reason was that a lot of traffic, hackers, HACKBOTS, Spammers and so on would be hitting the Gateway page for their ISP for potential hacking,

These ISP's might get a but pissed off with your website sending their gateway routers page swathes of traffic that could potentially harm them,

Therefore to prevent getting letters in the post that you are sending swathes of hackers to your homes or phones ISP gateway - as a lot of phones or tablets use proxies for their browsers anyway - is to send them back to their own localhost or 127.0.0.1.

Also instead of using a 301 permanent redirect rule you should use a 302 temporary redirect rule instead as that is the more appropriate code to use,

Use this rule instead

Therefore the rule I now recommend for anyone wanting to ban all IE 5, 5.5 and 6 traffic is below.

RewriteRule %{HTTP_USER_AGENT} (MSIE\s6\.0|MSIE\s5\.0|MSIE\s5\.5) [NC]
RewriteRule .* http://127.0.0.1 [L,R=302]

This Rewrite rule bans IE 5, 5.5 and IE 6.0 and sends the crawler back to the localhost on the users machine with a 302 rewrite rule. You can obviously add other rules in with BOTS and SQL/XSS injection hacks as well

This is a more valid rule as it's not a permanent redirect for the traffic such as if a page has changed it's name. Instead it's down to an invalid parameter or value in the HTTP Request that the user is being redirected to the new destination with a redirect.

If the user changed it's user-agent or parameters then it would get to the site and not be redirected with a 301 OR a 302 status code but instead get a 200 OKAY status code.

So remember, whilst an idea might seem good at first until you fully test it and ensure it doesn't cause problems it might not be all that it seems.

No comments: