Strictly Software: March 2013

Wednesday, 6 March 2013

Internet Censorship and Privacy - How they track you

This was taken from the www.darkpolitricks.com site.

Internet Censorship and Surfing Anonymously

By Dark Politricks

Sometimes it feels like the good old days of the Internet and being anonymous have passed - and you would be right.

With restrictive and snooping laws being passed all over the world, firewall filters wrapped around whole countries, Twitter users sued for Re-Tweeting libellous claims and big tech companies working hand in hand with the biggest security forces on the planet there really is no way to escape.

However there are ways to minimise your "footprint" and if you are not a serious criminal or terrorist then you shouldn't have anything to fear.

However if you are then you're probably being watched through your webcam right now whilst your iPhone's microphone is being channelled into GCHQ or Langley for analysing - tough luck!

From a users perspective the Internet contains a myriad of security and privacy issues which if the user is not aware of could cause potential problems on all manner of levels.

For the privacy conscious person who wants to be able to surf the net without worrying about someone looking at the content they have visited in real time or at a future date e.g your work, government or Police then there are a number of issues they need to be concerned about.

With the recent dismissal of the head of America's most powerful spy, David Petraeus, and knowledge about the way he was caught it is good to know the way's you are tracked so you can choose whether you want to take that risk.

As with most web content if you wanted to be 100% anonymous on the web it will be pretty hard to do.

If you want to stay totally anonymous you should probably move underground somewhere as there is always a satellite up there somewhere and with Google Earth you cannot even escape commercial companies anymore. So moving to the woods to live in a hut without electricity or broadband is not even an option anymore!

However there are various forms of tracking that you should be aware of so that you can limit the risks to you whilst surfing or using the Internet or phone. These don't have to be to hide from the Government but could just be to prevent your personal details from being sold to advertisers or having horrible popup boxes show when you close a window.

This is not a comprehensive list but it is a start and it is also one that is constantly changing as technology changes. When I wrote this originally tablets (iPad etc) were not that common. Now they are just another place to accumulate your browsing history and a tool to be used against you if you are ever in that unfortunate position.

Emails

This is is how the head of the CIA was caught out. He wasn't sending the emails but he was saving them as a draft under an anonymous Google account and then letting his mistress login and read them.

This way there was no Internet trail as when you send an email the mail is routed from server to server and the IP addresses of the mail servers it travelled through are recorded in the headers of your mail.

Check it yourself. If you use a mail client find the option to "View All Mail headers" and then view an email that has been sent to you. At the top you should see the details of the route that the email took which should show the originating mail server, the receiving mail server and the IP addresses of any it traveled through in-between.

When people use this draft save only technique they are trying to avoid this trail. However you are defeating yourself in the first instance by:

a) Using an online mail server such as Gmail or Hotmail. All the data including drafts are saved on THEIR servers i.e "in the cloud" so if they disable your account, or are served with a warrant there is a good chance your draft emails will be accessed and read along with your sent, read and junk emails.

b) If you don't hide your origin when signing up for a throwaway Gmail or Hotmail account (you have to fill in a form to get one in the first place) then they will still get your IP address unless you have gone through known secure Proxy servers or used someone else's computer without their knowledge (e.g an open Wifi router). Do a scan on your PC / Phone now and see if there are any around you. Open ones won't have a lock symbol next to them.

c) Remember that Microsoft computers store all deleted emails, web history and other files even when you think they have been deleted on your computer. Here is a very old article from 2000 which shows even back then Microsoft was hiding emails, web searches and other files from users. The scripts and batch files you might find if you search for "Microsofts Really Hidden Files" probably won't work anymore but they probably have no need for such old methods anymore especially when people are buying computers and defaulting them to backup everything to the cloud.

How to bypass

Don't use "cloud based" systems that are well known to have links with the US security services (Microsoft and Google have - as I have shown and both these mega companies are actively helping the US spy agencies with their own huge database). Read this article on why we are sleep walking into a surveillance society by consent. Then ask yourself whether using any of these big Internet companies software is safe, especially as they seem to gobble up smaller companies by the minute. Why create back-doors when you can walk in the front I asked myself when Microsoft bought SKYPE.

It may pain you but by using any Internet based email service it means recording the data as it leaves the device you write your message on, storing it on a computer system you have no control over and by signing the Terms and Conditions you have allowed them to "own" your data and use it for advertising and God knows what else.

Plus nothing leaves the Internet, you can view cached versions of Google (or any site) all the way back to the 90's on this site: http://archive.org/web/web.php

If you don't want your own website to appear on this search engine that archives everything forever then they are pretty good about obeying the robots.txt directives so you can put this in your robots.txt file (read about it here) to prevent that site indexing your site.

# alexa archiver
User-agent: ia_archiver
Disallow: /

To be really sure you can block the IP 207.241.224.41 in your .htaccess file or at your firewall if you wanted to stop them crawling your robots.txt file at all.

Also use throwaway email addresses if you can or even create your own with some basic scripting (not hard if you can be bothered) and put it on a server in another "more freedom friendly country" and use proper proxies to access the webpage front end to send your emails (the part about proxies come later on).

The need for these disposable email systems sprung out of the need for a quick email address to sign up to a site or set up an anonymous blog or anything else that you don't want all the spam emails that follows. I also have found with some basic hacking you actually use them to send AND receive email - it all depends on how good the programmer is.

Either do a search for "Disposable Email Addresses" to find the latest ones or check out guerrillamail.com or Mailinator however a word of warning - there are lots of disposable email accounts out there who knows who really owns them? If you do use them make sure their URL starts with https:// (this means data is encrypted from your PC to their server).

As for Internet files you can use "cleaner" tools like CCleaner to remove cookies, old registry files, old programs, start up applications and Internet history easily. Plus you should never use Internet Explorer anyway, as there are a myriad of more security conscious browsers such as FireFox out there which are much better on the privacy front as they are not tied into the operating system of your computer.

Using the Cloud to backup your "Secure" data

Storing anything on anyone else's computer means you don't have control over it. Therefore cloud based storage systems should be avoided for anything personal or secure. Even phones nowadays have settings to automatically back up your numbers, texts, photos and videos.

It might be good if you ever lose your computer or phone but remember that if the cloud based backup server is in the USA then they are probably sniffing everything inputted into it anyway.

This is the same for Facebook, Google, Tumblr and any other social media site. As soon as you put anything on that site THEY own it and if they are served a subpoena or warrant to hand the data over there is nothing you can do.

How to bypass

It may pain you but just don't use Facebook, Google, Twitter or any other social media system if you are going to put anything dodgy on it.

The same goes for dropbox and any other web based storage centre that may have to hand over your data to the authorities one day.

If you must keep files secret then keep a portable external hard drive at home and backup all your files to that device before hiding it. At least that way you have control and ownership over the backed up data and you are the only person who knows where it is located.

Using your computer hardware to spy on you

There have been many cases lately where computers and phones have been used against their owners to spy on them.

Last year there was a big outcry about iPhones "secret" database that logged all the GPS positions you had been to with your phone. Even without GPS they can use phone mast triangulation to find a near enough point that your phone pinged the mast.

Also we had the case of cops using tools to download this data as they pulled motorists over or arrested people and then illegally accessed this database of locations to find out where you had been.

As for computers we had a school in Lower Merion school district in Philadelphia that was accused of spying on students in their bedrooms via school issued laptops and the webcams built into them. Would you want a headmaster in his office alone at night watching your kids in their bedroom?

How to bypass

Take the battery out of your phone whenever you don’t want to be tracked.

As the earlier report shows cellphone triangulation tracking takes less power than GPS tracking and even when your phone is turned off a tiny amount of battery charge is available to the phone which is enough to log your presence at a nearby tower and then log your presence down to the nearest 100 metres or so.

Either that or use pay phones or pay passers by to use their phone when you need to make a call when your phone is unavailable.

On your computer turn off the microphone and webcam with your settings e.g on Windows it's in Control Panel. On Windows 7 it will be under Speech Recognition and Audio Devices. To be extra safe wrap masking tape over the webcam when you don't want to use it as well as the speaker (blue-tack or something else that would muffle the sound). Anything that can be useful to you can be useful to someone with control of your computer.

To test if your microphone is working either go into your computers settings e.g control panel or go to the old Google Search Engine (if it's still available at http://www.google.com/webhp?hl=all ) hit the microphone symbol in the input box and talk.

If you see the blocks under the microphone move up and down and then a result similar to what you said appear in the box - the microphone is still on. If it's off it will say so.

Javascript urchins

These are little bits of script that are added to the source code of the HTML page you are visiting. They use JavaScript to record identifying features about the user and their browser such as the user-agent, system details and location by calling a script on another server that then logs these details to a central database. A good example is Google Analytics which most sites including this one use to tell the owner about the amount of traffic they receive and where it is coming from.

How to bypass

Turning off Javascript will prevent this logging from occurring. You can do this in most browsers through their Tools > Options settings but you can get toolbars and add-ons like the FireFox Web Developer Toolbar or the NoScript add-on that do this for you.

Webbugs

Similar to urchins these are little images, usually so small they cannot be seen, that point back to a web server and run some code whenever the image is loaded by a client. They tend to be used by email marketing tools and are embedded within HTML emails so that they can record who has actually opened the email and track the email if its forwarded it on.

They can also exist on web pages or within desktop applications and as the image is hosted remotely whenever it is loaded it records the location of the application or user who is loading it.

How to bypass

Many email clients if they don't do it automatically have the option to display emails as plain text which would prevent these webbugs from working. I use Thunderbird which is free and you can set to ask you first whether to load any remote content at all whether they are images, scripts or anything not already embedded within the email.

In Browsers you can disable images easily with the Web Developer toolbar, Google Chromes privacy settings or by using a text browser like Lynx.

Server Side logging by the page

Most pages on the web nowadays are more than pure HTML/CSS and contain code that runs server side e.g .asp, .php, .jsp, .aspx etc.

When the page is requested the web server parses the page and runs any code before returning the generated HTML to the client. This code has access to a lot of information about the client requesting the page such as IP address which can be used for GEO tagging, User-agent details, accepted file types and other information contained within the headers. They could choose to log this information to a database or file if they wanted to even if the IIS or Apache web server had its own logging disabled.

For example if you got to whatsmyip.org you will see all the information that is passed to each webpage you request including geo-location information, details about the type of computer you are using and much more. Whilst not totally accurate they can pinpoint the last location of the computer used to access a webpage which could be your own PC or could be someone else's (if you use a proxy - see below).

How to bypass

Please read the guide under the following section about web server logging as it applies to both.

Logging by the Web Server

Every time you make an HTTP request e.g access a web page, a record is made on the web server that hosts that page to a log file. Each separate file contained within that web page is logged so every image, CSS file and script is logged along with your IP address, the method e.g POST or GET, the URL, bytes sent and received and much much more.

Although its possible to turn off this logging most companies running web servers require these logs for traffic analysis e.g with a tool such as Webtrends as it helps analyse traffic from all agents including robots who do not have JavaScript support. Also many countries now require ISP's to keep log files for up to a year or more in case the data is required at a later date.

How to bypass

As you must assume that the web servers you are visiting sites on have logging enabled then the only way to not get tracked is to go through proxy servers or use tools like the FireFox add-ons Modify Headers or Tamper Data which allow you to change the headers sent from your PC to the webserver in question and act as a mini proxy on your own PC. They cannot however change the REMOTE_ADDR header which holds the IP address of the PC making the request.

Another way is to turn your PC into a webserver through free software like WAMP Server and then create a web based proxy for your surfing. The good thing about this is that in the remote servers log files all they will see for an IP address is 127.0.0.1 which is the local loopback IP address and cannot be used to track anyone as every PC uses that address.

Remember a proxy is just an intermediate server that sits between you and the web server you want to access. If someone was tracking you they would only see your request to the proxy server and not the actual content that the proxy server requests on your behalf.

There are various forms of proxy some that are anonymous and others that pass your IP address along in the HTTP_FORWARDED_FOR, HTTP_X_FORWARDED_FOR or any number of other headers. You can also use code or tools to fill these headers with random IP addresses to make it harder for a tracker to find you as it will look like you have bounced round a lot of proxies when in fact you haven't.

There is also a form of proxy known as an "anonymizer", which is called this because it hides all the users identifying information such as headers that hold the IP and user-agent. There are lots online for you to use.

Anonymizers are not entirely secure. If an anonymizer keeps logs of incoming and outgoing connections and the anonymizer is physically located in a country where it is subjected to warrant searches then there is a potential risk that government officials can reverse engineer and identify all users who used the anonymizer and how they used it.

Most anonymizers state they do not keep logs but there is currently no way to confirm that. However, if the user used another anonymizer to connect to the exposed anonymizer, that user is still anonymous. This is sometimes called daisy-chaining.

The safest way therefore is to use a chain of proxy servers to make your requests or use a specialist service like TOR which is designed to make it hard to track Internet usage.

P2P Torrents

People use torrents to download films, music and other software. Sometimes these are illegally obtained copies or pirated software.

The Pirate Bay was one of the most famous sites that people used to obtain torrents and the people behind it are currently involved in legal action as the US movie industry is trying to sue them for facilitating the illegal download of copyrighted material.

Even though they are just a search engine on the same lines as Google or BING (and you can find torrents on those search engines as well!) - it is pretty unfair as the Pirate Bay are not uploading the films themselves they are just a search engine that lists files of a certain type.

When you download torrents you use special software such as uTorrent or Deluge to download all the tiny pieces of the file you want. The idea is that because you are not downloading a whole file from one location but rather tiny bits of it from lots of locations you are not really breaking the law.

When you download you are a "leecher" and when you upload you are a "seeder". The software simplifies all this when you download a file as it connects all the tiny bits up for you so you don't have to worry about where they are coming from.

Also as you download you are also uploading the bits you have already downloaded so other people can obtain them. You can change your settings to prevent the uploading part of this if you want to by changing the ratio of upload versus download or the rate/speed that you upload (or even turn it off).

The Priate Bay was the biggest site on-line which is why it is being targeted and if you try accessing www.thepiratebay.org in your browser now I bet you it will be blocked by your ISP.

How to bypass

There are many proxies for the Pirate Bay which will allow you to access the site from a different URL. Just search for "Pirate bay proxies" and then pick one.

You might find an advert at the top of the page counting down - this is a way to access the site once the count is down to 0. Ignore the main part of the page and click on the "view" button that might appear after the countdown in the top right corner which should take you to the pirate bay proxy.

You might have to try a few out first but I use https://piratereverse.info. As soon as the ISP shuts one down another one will pop up (just like the thousands of people who pointed domains at WikiLeaks when it was blocked) so you will always be able to find a site to get them from whether it's the Pirate Bay or a User Group or discussion board.

Also beware that many torrent tools will be flagged as Trojan down-loaders (even when they are not) and also that ISP's and other government organisations insert their own trackers that log the IP addresses of people downloading the torrents so that they can contact/blacklist/reduce your bandwidth etc. Therefore be careful and pick a good one and read up about trackers before engaging in torrent downloading.

To make the chance of being caught a lot less you can should change your torrent tool settings to go through a proxy server - preferably HTTPS (encrypted) or use any option that forces encryption when transferring files.

You should also change the port used by the tool in your settings from a random number to 80 or 8080 as these are common webserver ports and make it hard for ISP's to tell what kind of traffic is being transferred. If possible use a "block list" that will mean that all the data packets sent to or from you will bypass known ISP routers where they can be sniffed and identified. More and more ISP's are doing this so this is wise to prevent yourself from being caught.

Read these articles to help you install a torrent down-loader and set-up measures to prevent yourself being blocked.

http://thecyndicate.com/Communication/showthread.php?703-%26%239432%3B-Download-Torrents-SAFELY-Without-Your-ISP-Tracking-You

http://www.ehow.com/how_8716782_torrents-work-isp-block.html

Cookies

Cookies are small text files that are stored on the clients computer and contain very small pieces of text. They are mainly used by websites to store flags that enable the site to know whether you have previously been to their site or not. Advertisers also use them to track the type of sites you visit so that they can deliver targeted advertising the biggest offender being Google which uses their domination of the market to track the sites users visit so they can target content specific adverts to the user.

Another type of cookie is a session variable which is used by many sites to store a unique ID that refers to a visit on the site. The ID is generated by the web server and the session cookie only stores this ID so that on each request to the server the system knows that the visitors requests belong to one visit.

How to bypass

If you are concerned about tracker cookies then you easily disable site related cookies in your browser but if you disable all cookies then Session variables won't work and you will most likely find yourself getting logged out of member only areas of websites or not being able to login in the first place.

The best option is to disable 3rd party cookies (those set by advertisers) and to delete non essential cookies after using the Internet (Incognito mode in Chrome).

Flash, ActiveX, Java Applets

3rd party components such as Flash, ActiveX controls and Java applets come with their own security concerns. There have been numerous security vulnerabilities reported with these types of component as due to their complexity and power they have more access to the clients computer than a normal web page. They should be seen as mini applications rather than just a fancy banner, game or helpful utility to enable you to upload files to Facebook more quickly.

You shouldn't install these types of application unless you are totally sure they are safe as they could have a lot more control over your computer than you realise. There have even been hacks that have enabled remote users to video and record a user through their webcam without them knowing.

How to bypass

You can use Firefox extensions such as FlashBlock or AdBlocker to disable flash on specific pages or the Opera browsers Turbo mode which speeds up page loads as well as allowing you to choose which flash movies to play. If you decided to choose privacy over anything else then you will end up having a pretty boring web experience as more and more sites use Javascript and Flash to deliver interactive content.

However if you are really security conscious you should use a text browser such as Lynx which won't load images, flash, JavaScript or any other form of plug-in. It will show you the textual content of the pages you visit and will ask if you want cookies to be stored for each request. Due to only loading text and links you will have fast load times so there is a benefit to having a reduced web interface.

You should also regularly check your PC for viruses and spyware. One of the first things modern Trojans do nowadays is download good anti-virus software so that they don't get overwritten by another spyware app!

They also try to disguise themselves as virus checkers to avoid detection. Even the best off the shelf virus checkers don't catch all forms of spyware especially those that have to regularly download virus definition patterns as it means new viruses don't get caught until they have been identified, a pattern created and downloaded by the client.

Virus payloads can also be modified randomly to avoid pattern detection so tools that don't use pattern matching such as hijackthis.exe which runs an analysis of all currently running processes looking for odd behaviour are good tools to use. This tool will generate a report which can then be analysed by members of the special Hijackthis.exe message board for signs of infection.

One of the best removers of Trojans I have found is a tool called SDFix.exe which managed to detect and remove a Trojan that four other tools including an off the shelf app didn't detect. There are also a number of good free products such as MalwareBytests Anti-Malware and AdAware anti adware and spyware software which can be run regularly to check your PC for spyware and viruses.

However keyloggers that are based around hardware such as cable extensions that you don't notice that have been inserted by your employer are undetectable unless you know what you are looking for and will store every key pressed on your PC whilst enabled. Check your cables that come out your computer to see if anything strange is connecting two parts of a wire together.

If you are caught out by such a tool make sure your employer has followed the law by informing you of any anti-privacy measures he or she may have introduced such as monitoring your PC and web usage in your contract. If they haven't then you have a good legal case to sue them and they are breaking the law by spying on you without your knowledge. The same goes for CCTV, recording devices and other means of logging your activity without your knowledge.

Article 8 of the Human Rights Act that is used in the UK has been successfully used in previous cases by employees who have been sacked due to unknown spying by their employers and should be used by anyone taking their employer to court if they have been sacked due to such technological spying.

Tools to use to aid privacy on the web Firefox Add-Ons

Web Developer toolbar. Disable Javascript, cookies, view cookie and header info, modify the DOM, view generated source code, show password fields.
Flashblock disables flash movies until you enable them. Allows creation of a white-list of allowed sites.
FoxyProxy manage your proxies with an easy to use tool.
Tamperdata or Modify Headers acts like a proxy and allows you to modify HTTP requests as they are made from your client.
HTTP Fox, Firebug and even the Chrome developer toolbar allows you to see all the data your PC send to websites and the data sent back by the webserver you are accessing. It also shows any redirects or code loaded in that you might not be aware of.

Google Chrome

Use Incognito browsing to prevent browser and search history and cookies from being stored.
Firefox and IE9 also have privacy modes that can be used to remove cookies and reduce your internet footprint but I would not trust anything Microsoft as it's hooked into the computers main system and parts of the browser are shared with other non Internet based software.

All browsers

De-activate Javascript, VBScript (IE only) until you know the site is safe.
De-activate 3rd party cookies used by trackers, advertisers and sites wanting to keep track of you as move around the web such as Google Analytics.
If you share a PC Clear your cache, autocomplete, download list and history regularly - use CCleaner, AdAware etc.

If you need more details about the various forms of Internet Censorship and how to bypass it then check out the following article that contains a lot of details about the various methods used and how to bypass them.

How to bypass Internet Censorship If you are looking for an up to date list of available proxy servers then you can check out the following links:

http://nntime.com/proxy-list-01.htm

http://www.proxies.by/proxy/

http://www.workingproxies.org/

The following page has an index where you can find more proxy lists

http://www.dmoz.org/Computers/Internet/Proxying_and_Filtering/Hosted_Proxy_Services/Free/Proxy_Lists/

If you want to quickly access some web based proxies you can pick from the following list or you can read my guide on creating your own web proxy which comes with an example and some code you can use to get running quickly.

Read the original article at www.darkpolitricks.com.

Sunday, 3 March 2013

Stop BOTS and Scrapers from bringing your site down

Blocking Traffic using WebMin on LINUX at the Firewall

If you have read my survival guides on Wordpress you will see that you have to do a lot of work just to get a stable and fast site due to all the code that is included.

The Wordpress Survival Guide

For instance not only do you have to handle badly written plugins that could contain security holes and slow the performance of your site but the general WordPress codebase is in my opinion a very badly written piece of code.

However they are slowly learning and I remember once (and only a few versions back) that on the home page there were over 200+ queries being run most of them were returning single rows.

For example if you used a plugin like Debug Queries you would see lots of SELECT statements on your homepage that returned a single row for each article shown for every post as well as the META data, categories and tags associated with the post.

So instead of one query that returned the whole data set for the page in one query (post data, category, tag and meta data) it would be filled with lots of single queries like this.

SELECT wp_posts.* FROM wp_posts WHERE ID IN (36800)

However they have improved their code and a recent check of one of my sites showed that although they are still using seperate queries for post, category/tag and meta data they are at least getting all of the records in one go e.g

SELECT wp_posts.* FROM wp_posts WHERE ID IN (36800,36799,36798,36797,36796)

So the total number of queries has dropped which aids performance. However in my opinion they could write one query for the whole page that returned all the data they needed and hopefully in a future edition they will.

However one of the things that will kill a site like Wordpress is the amount of BOTS that hit you all day long. These could be good BOTS like GoogleBOT and BingBOT which crawl your site to find out where it should appear in their own search engine or they could be social media BOTS that look for any link Twitter shows or scrapers trying to steal your data.

Some things you can try to stop legitimate BOTS like Google and BING from hammering your site is to set up a Webmaster Tools account in Google and then change the Crawl Rate to a much slower one.

You can also do the same with BING and their webmaster tools account. However with BING they apparently respect the ROBOTS.txt command DELAY e.g

Crawl-delay: 3

Which supposedly tells BOTS that respect the Robots.TXT commands that they should wait 3 seconds before each crawl. However as far as I know only BING support this at the moment and it would be nice if more SERP BOTS did in future.

If you want a basic C# Robots.txt parser that will tell you whether your agent can crawl a page on a site, extract any sitemap command then check out > http://www.strictly-software.com/robotstxt however if you wanted to extend it to add in the Crawl-Delay command it wouldn't be hard ( line 175 in Robot.cs ) to add in so that you could extract and respect it when crawling yourself.

Obviously you want all the SERP BOTS like GoogleBot and Bingbot to search you but there are so many Social Media BOTS and Spammers out there nowadays that they can literally hammer your site into the ground no matter how many caching plugins and .htacess rules you put in to return 403 codes.

The best way to deal with traffic you don't want to hit your site is as high up the chain as possible.

Just leaving Wordpress to deal with it means the overhead of PHP code running, include files being loaded, regular expression to test for harmful parameters being run and so on.

Moving it up to the .htaccess level is better but it still means your webserver is having to process all the .htacess rules in your file to decide whether or not to let the traffic through or not.

Therefore if you can move the worst offenders up to your Firewall then it will save any code below that level from running and the TCP traffic is stopped before any regular expressions have to be run elsewhere.

Therefore what I tend to do is follow this process:

Use the Wordpress plugin "Limit Login Attempts" to log people trying to login (without permission) into my WordPress website. This will log all the IP addresses that have attempted and failed as well as those tht have been blocked. This is a good starting list for your DENY HOSTS IP ban table
Check the same IP's as well as using the command: tail -n 10000 access_log|cut -f 1 -d ' '|sort|uniq -c|sort -nr|more to see which IP addresses are visiting my site the most each day.
I then check the log files either in WebMin or in an SSH tool like PUTTY to see how many times they have been trying to visit my site. If I see lots of HEAD or POST/GET requests within a few seconds from the same IP I will then investigate them further. I will do an nslookup and a whois and see how many times the IP address has been visiting the site.
If they look suspicious e.g the same IP with multiple user-agents or lots of requests within a short time period I will comsider banning them. Anyone who is using IE 6 as a user-agent is a good suspect (who uses IE 6 anymore apart from scrapers and hackers!)
I will then add them to my .htaccess file and return a [F] (403 status code) to all their requests.
If they keep hammering my site I wll then move them from my DENY list in my .htaccess fle and add them to my firewall and Deny Hosts table.
The aim is to move the most troublesome IP's and BOTS up the chain so they cause the least damage to your site.
Using PHP to block access is not good as it consumes memory and CPU, the .htaccess file is better but still requires APACHE to run the regular expressions on every DENY or [F] command. Therefore the most troublesome users should be moved up to the Firewall level to cause the less server usage to your system.
Reguarly shut down your APACHE server and use the REPAIR and OPTIMIZE options to de-frag your table indexes and ensure the tables are performing as well as possible. I have many articles on this site on other tools which can help you increase your WordPress sites perforance with free tools.

In More Details

You should regularly check the access log files for the most IP's hitting your site, check them out with a reverse DNS tool to see where they come from and if they are of no benefit to you (e.g not a SERP or Social Media agent you want hitting your site) then add them to your .htaccess file under the DENY commands e.g

order allow,deny
deny from 208.115.224.0/24
deny from 37.9.53.71

Then if I find they are still hammering my site after a week or month of getting 403 commands and ignoring them I add them to the firewall in WebMin.

Blocking Traffic at the Firewall level

If you use LINUX and have WebMin installed it is pretty easy to do.

Just go to the WebMin panel and under the "Networking" menu is an item called "Linux Firewall". Select that and a panel will open up with all the current IP addresses, Ports and packets that allowed or denied access to your server.

Choose the "Add Rule" command or if you have an existing Deny command you have setup then it's quicker to just clone it and change the IP address. However if you don't have any setup yet then you just need to do the following.

In the window that opens up just follow these steps to block an IP address from accessing your server.

In the Chain and Action Details Panel at the top:

Add a Rule Comment such as "Block 71.32.122.222 Some Horrible BOT"
In the Action to take option select "Drop"
In the Reject with ICMP Type select "Default"

In Condition Details Panel:

In source address of network select "Equals" and then add the IP address you want to ban e.g 71.32.122.222
In network protocol select "Equals" and then "TCP"

Hit "Save"

The rule should now be saved and your firewall should now ban all TCP traffic from that IP address by dropping any packets it receives as soon as it gets them.

Watch as your performance improves and the number of 403 status codes in your access files drop - until the next horrible social media BOT comes on the scene and tries scrapping all your data.

IMPORTANT NOTE

WebMin isn't very clear on this and I found out the hard way by noticing that IP addresses I had supposedly blocked were still appearing in my access log.

You need to make sure all your DENY RULES are above the default ALLOW rules in the table WebMin will show you.

Therefore your rules to block bad bots, and IP addresses that are hammering away at your server - which you can check in PUTTY with a command like this:
tail -n 10000 access_log|cut -f 1 -d ' '|sort|uniq -c|sort -nr|more

Should be put above all your other commands e.g:

Drop If protocol is TCP and source is 91.207.8.110
Drop If protocol is TCP and source is 95.122.101.52
Accept If input interface is not eth0
Accept If protocol is TCP and TCP flags ACK (of ACK) are set
Accept If state of connection is ESTABLISHED
Accept If state of connection is RELATED
Accept If protocol is ICMP and ICMP type is echo-reply
Accept If protocol is ICMP and ICMP type is destination-unreachable
Accept If protocol is ICMP and ICMP type is source-quench
Accept If protocol is ICMP and ICMP type is parameter-problem
Accept If protocol is ICMP and ICMP type is time-exceeded
Accept If protocol is TCP and destination port is auth
Accept If protocol is TCP and destination port is 443
Accept If protocol is TCP and destination ports are 25,587
Accept If protocol is ICMP and ICMP type is echo-request
Accept If protocol is TCP and destination port is 80
Accept If protocol is TCP and destination port is 22
Accept If protocol is TCP and destination ports are 143,220,993,21,20
Accept If protocol is TCP and destination port is 10000

If you have added loads at the bottom then you might need to copy out the IPTables list to a text editor, change the order by putting all the DENY rules at the top then re-saving the whole IPTable list to your server before a re-start of APACHE.

Or you can use the arrows by the side of each rule to move the rule up or down in the table - which is a very laborious task if you have lots of rules.

So if you find yourself still being hammered by IP addresses you thought you had blocked then check the order of your commands in your firewall and make sure they are are at the top NOT the bottom of your list of IP addresses.

Wednesday, 6 March 2013

Internet Censorship and Privacy - How they track you

Internet Censorship and Privacy - How they track you

Internet Censorship and Surfing Anonymously

Sunday, 3 March 2013

Stop BOTS and Scrapers from bringing your site down

Who is Strictly-Software?

My Stuff

Settings

Sites to Visit

Strictly-Software Tweets

Blog Archive

My Top Articles

Translate My Blog

Search This Blog

Labels

Wednesday, 6 March 2013

Internet Censorship and Privacy - How they track you

Internet Censorship and Privacy - How they track you

Internet Censorship and Surfing Anonymously

Sunday, 3 March 2013

Stop BOTS and Scrapers from bringing your site down

Who is Strictly-Software?

My Stuff

Settings

Sites to Visit

Strictly-Software Tweets

Blog Archive

My Top Articles

Translate My Blog

Subscribe to Strictly-Software

Search This Blog

Labels