Showing posts with label BASH. Show all posts
Showing posts with label BASH. Show all posts

Wednesday, 5 October 2016

Disk Full - Linux - Hacked or Full of Log Files?

Disk Full - Linux - Hacked or Full of Log Files?

By Strictly-Software

This morning I woke up to find the symptoms of a hack attempt on my LINUX VPS server.

I had the same symptoms when I was ShockWave hacked a few years ago and some monkey overwrote a config file so that when I rebooted, hoping to fix the server, it would reload it in from a script hidden in a US car site.

They probably had no idea that the script was on their site either, but it was basically a script to enable various hacking methods and the WGet command in the config file ensured that my standard config was constantly overwritten when the server was re-started.

Another symptom was that my whole 80GB of disk space had suddenly filled up.

It was 30GB the night before and now with 30 odd HD movies hidden in a secret folder buried in my hard drive I could not FTP anything up to the site, receive or send emails or manually append content to my .htaccess file to give only my IP full control.

My attempts to clear space by clearing cached files was useless and it was only by burrowing through the hard drive folder by folder all night using the following command to show me the biggest files (visible and hidden) that I found the offending folder and deleted it.


du -hs $(ls -A)


However good this command is for finding files and folders and showing their size in KB, MB or GB, it is a laborious task to manually go from your root directory running the command over and over again until you find the offending folder(s).

So today when I thought I had been hacked I used a different process to find out the issue.

The following BASH script can be run from anywhere on your system in a console window and you can either enter a path if you think you know where the problem lies or just enter / when prompted to scan the whole machine.

It will list first the 20 biggest directories in order of size and then the 20 largest files in order of size.

echo -n "Type Filesystem: ";
read FS;NUMRESULTS=20;
resize;clear;date;df -h $FS;
echo "Largest Directories:"; 
du -x $FS 2>/dev/null| sort -rnk1| head -n $NUMRESULTS| awk '{printf "%d MB %s\n", $1/1024,$2}';
echo "Largest Files:"; 
nice -n 20 find $FS -mount -type f -ls 2>/dev/null| sort -rnk7| head -n $NUMRESULTS|awk '{printf "%d MB\t%s\n", ($7/1024)/1024,$NF}'

After running it I found that the problem was not actually a security breach but rather a plugin folder within a website containing log files. Somehow without me noticing the number of archived log files had crept up so much that it had eaten 50GB of space without my knowledge.


As the folder contained both existing and archived log files I didn't want to just truncate it or delete everything instead I removed all archived log files by using a wildcard search for the word ARCHIVED within the filename.


rm *ARCHIVED*


If you wanted to run a recursive find and delete within a folder then you may want to use something a bit different such as:


ind -type f -name '*ARCHIVED*' -delete


This managed to remove a whole 50GB of files within 10 minutes and just like lightening my sites, email and server started running again as they should have been.

So the moral of the story is that a full disk should be treated first as a symptom of a hacked server, especially if you were not expecting it, and the same methods used to diagnose and fix the problem can be used whether you have been hacked or allowed your server to fill itself up with log files or other content.

Therefore keep an eye on your system so you are not caught out if this does happen to you and if you do suddenly jump from 35GB to 80GB and stop receiving emails or being able to FTP content up (or files being copied up as 0 bytes), then you should immediately put some security measures into place.

My WordPress survival guide on security has some good options to use if you have been hacked but as standard you can do some things to protect yourself such as


  • Replacing the default BASH language with a more basic, older and secure DASH. You can still run BASH once logged into your console but as default it should not be running and allow hackers to run complex commands on your server.
  • You should always use SFTP instead of FTP as its more secure and you should change the default SSH port from 22 to another number in the config file so that standard port scanners don't spot that your server is open and vulnerable to attack.
  • If you are running VirtualMin on your server you should also change the default port for accessing it from 10000 to another number as well. Otherwise attackers will just swap from SSH attacks by console to web attacks where the front end is less protected. Also NEVER store the password in your browser in case you forget to lock your PC one day or your browsers local SQLLite Database is hacked and the passwords compromised.
  • Ensuring your root password and every other user password is strongly typed. Making passwords by joining up phrases or rememberable sentences where you swap the capitals and non capital letters over is a good idea. And always add a number to the start or end, or both as well as some special characters e.g 1967bESTsAIDfRED*_* would take a dictionary cracker a very long time to break.
  • Regularly change your root and other user passwords in case a keylogger has been installed on your PC and discovered them.
  • Also by running DENYHOSTS and Fail2Ban on your server you can ensure anyone who gets the SSH password wrong 3 times in a row is blocked and unable to access your console or SFTP files up to your server. If you forget yourself you can always use the VirtualMin website front end (if installed) to login and remove yourself from the DenyHosts list.
  • If you are running WordPress there are a number of other security tools such as the WordPress Firewall plugin that you can install which will hide your wp-admin login page away behind another URL and redirect people trying to access it to another page. I like the https://www.fbi.gov/wanted/cyber URL myself. It can also ban people who fail to login after a number of attempts for a set amount of time as well a number of other security features.


Most importantly of all regularly check the amount of free space you have on your server and turn off any logging that is not required if you don't need it.

Getting up at 5.30AM to send an email only to believe your site has been hacked due to a full disk is not a fun way to spend your day!


By Strictly-Software

 © 2016 Strictly-Software

Sunday, 14 June 2015

The Wordpress Survival Guide - Part 2 - Performance

Surviving WordPress - Performance and Site Optimization


UPDATED - 14th Jun 2015

I have updated this to include a way to handle MySQL errors, a BASH script to tune Apache and an improved function to check your servers load and handle Windows errors. 

Plus code to disable the new WordPress HeartBeat functionality which can be a CPU / Bandwidth killer and a way to add CRON jobs to automate plugin functions without console access.

This is the second part of my guide to surviving WordPress and as promised it looks at performance tweaks and tips which I have gathered on my way.

It has been quite a while since the first instalment and the main reason for this was that I was suffering my own performance killer which I wanted to solve first before writing this article. Luckily this has now been solved with the help of Robert from Tiger Tech blog who helped me get to the bottom of the issue so here it is.

My own personal journey into WordPress performance tuning started off when I started to experience out of PHP memory errors when manually rebuilding my Google sitemap.

I started to play around with different plugins and then delve into the code which is when I started to realise the danger that WordPress plugins can carry out to a site when the user doesn't realise what's going on behind the scenes.

You can check out a detailed examination here but in my case it was using a Google Sitemap plugin that was set to rebuild when a new post was saved. Combining that with WP-O-Matic which imports articles at scheduled intervals and a TwitterBot such as my own which can send Tweets to multiple accounts whenever new content is added all added up to a performance killer!

If you have a similar setup it's worth running TOP, MyTOP and checking your access logs to see how it affects your own system but what was happening on my own setup was:

  • WP-O-Matic starts to import a feeds worth of articles (max of 10) for each article that is saved.
  • Numerous procedures hooked into the SavePost or PublishPost action run. In my case it was:
  1. My Strictly AutoTags plugin runs which analyses the article and adds relevant tags, depending on the admin settings, the number of tags and the length of the article this could be quick or slow.
  2. The Google Sitemap plugin then ran which runs a lot of SQL queries and creates a new file as well as pinging multiple SERPs with HTTP requests.
  3. My Strictly Tweetbot Plugin also runs which posts a tweet to multiple accounts. This caused a Twitter Rush as 50+ BOTS all hammered my site at the same time due to the new link appearing on Twitter. 
  4. Any other plugin using the Save hooks runs such as caching tools which create static files.
  • As soon as the Tweets arrive on Twitter a multitude of Bots, 50 on my last test, will visit the site to index the link that has just been posted OR try and scrape, hack or insert spam comments into the post.
  • If the link was posted to multiple accounts you will find that the same bots will visit for each account you posted to. Some bots like Yahoo seem to be particularly bad and visit the article multiple times anyway. So if I posted to 5 twitter accounts that's 250 visits in the space of a few seconds from BOTS scanning for new tweet links to visit!
  • All these visits create new Apache processes and depending on the amount of memory that each Apache process uses you could find that your server starts swapping memory to disk to handle the increase and in my case my server load would quickly jump from 0.15 to 50+.

The more articles you import the more iterations of this chain of performance killing events occurs. I found that these events would sometimes pass off without any noticeable problems but other times the server load would get so high that I would have to reboot my machine.

The highest value I recorded was 174 on a 1GB RAM Linux server!

In fact on some days I would have to reboot 3-5 times which is not good at all.

Getting to the bottom of the problem

A common solution to any performance related problem is to throw more resources at it. Many message boards recommended increasing the maximum memory limit to get round the Out of Memory errors the Google Sitemap was throwing up but that just masks the issue and doesn't actually solve it.

As a by product of my system tuning I ended up creating my own Google Sitemap Plugin to overcome limitations of the others.

Not only could it be easily set to rebuild at scheduled intervals instead of only when new posts were added which helps reduce unnecessary rebuilds, but it used far less memory and made a tiny number of database queries in comparison to the other market leaders.

I also created a System Reporting plugin so that I could be kept informed when my site was playing up and I found this invaluable in keeping my site running during this performance nightmare. If you are not on your site 24/7 and cannot afford professional monitoring services it is great to get an email telling you if you site is: down, taking ages to respond, has a very high server load or running too many SQL queries.

One of the first ideas to reduce the amount of times I was rebooting was to try and prevent any performance intensive tasks from running if the server load was already high.

I did this by adding in some checks to all my major plugins that made a call to the following function before running anything. If the load was above 1.0 I just exited immediately. You can read more about this method in this article: Testing Server Load.

function GetServerLoad(){

 $os = strtolower(PHP_OS); 
 
 // handle non windows machines
 if(substr(PHP_OS, 0, 3) !== 'WIN'){
  if(file_exists("/proc/loadavg")) {    
   $load = file_get_contents("/proc/loadavg"); 
   $load = explode(' ', $load);     
   return $load[0]; 
  }elseif(function_exists("shell_exec")) {     
   $load = @shell_exec("uptime");
   $load = explode(' ', $load);        
   return $load[count($load)-3]; 
  }else { 
   return false; 
  } 
 // handle windows servers
 }else{ 
  if(class_exists("COM")) {     
   $wmi  = new COM("WinMgmts:\\\\."); 
   if(is_object($wmi)){
    $cpus  = $wmi->InstancesOf("Win32_Processor"); 
    $cpuload = 0; 
    $i   = 0;   
    // Old PHP
    if(version_compare('4.50.0', PHP_VERSION) == 1) { 
     // PHP 4      
     while ($cpu = $cpus->Next()) { 
      $cpuload += $cpu->LoadPercentage; 
      $i++; 
     } 
    } else { 
     // PHP 5      
     foreach($cpus as $cpu) { 
      $cpuload += $cpu->LoadPercentage; 
      $i++; 
     } 
    } 
    $cpuload = round($cpuload / $i, 2); 
    return "$cpuload%"; 
   }
  } 
  return false;     
 } 
}


Apache Configuration

I finally got to the bottom of the problem I was suffering with the help of Tiger Tech after examining the output of ps auxwwH --sort rss during a period of high load. This listed all the currently running processes ordered by the amount of memory they were consuming.

At the time of running this my average load was 50 which meant there was a big queue of processes waiting to be run which included over 70 Apache processes each using between 8MB and 30MB and this alone was easily using up my 1GB of RAM.

This high number of Apache processes meant that my server was busily swapping from real memory to disk based virtual memory which was causing high I/O (clearly seen from the output of iostat) and slowing down the response times of each Apache process.

As each process got slower to respond new processes were spawned using up even more virtual memory adding to the problem. This spiral of death was only resolved if for some reason the traffic suddenly screeched to a halt (not likely during an article import that delivers hundreds of bots from Twitter on top of normal traffic) OR I killed Apache or the server.

The solution to this problem was to reduce the number of simultaneous Apache processes that could be run at one time by reducing the MaxClients setting in the Apache config file.

My existing setting of 256 was far too high for my 1GB RAM server. The way to calculate a more appropriate setting is to take the average size of an Apache process and then divide the total available memory by that number leaving room for other processes such as MySQL. In my case I was advised to set MaxClients to a value of 20 which seems small in comparison to the original value but makes more sense when you do the maths.

I have actually created a BASH script which you can run on your own server which will test the available space, average Apache process size, and then calculate the values for your MaxClients, MinSpareServers and MaxSpareServers which you can read here: BASH MaxClients Tuning Script.

Reducing my MaxClients setting to a much smaller value meant that the memory allocation for my system would never reach such unmanageable amounts again. If my server is swamped by traffic then instead of 256 Apache processes being spawned all trying to claim 20MB or more for themselves they will be queued up in an orderly fashion.

It might slow down some requests as they wait to be dealt with but that is far better than the whole server freezing up which was occurring regularly.

Two other settings I changed in the Apache conf file was the Timeout value down from 300 to 30 and HostnameLookups was turned off. You can read more about these settings at the Apache performance tuning site.

Another recent issue I have just had was eerily the opposite of the above. I would get periods of very low server load (0.00 - 0.02) and there would be no Apache or MySQL processes running. The websites couldn't be accessed and only a restart of Apache would fix it.

At first I was checking the Apache error logs and seeing lots of "MySQL Server has gone away" errors. I found that this was a common issue in WordPress and created a custom wp-db.php file which would re-connect to the server if a query ran and met that error. You can read more about that script here: Fixing the MySQL Server Has Gone Away Error.

However this just got rid of the error messages it didn't really fix any problems.

After a lot of reading and tuning I eventually found what "seems" to be a fix for this issue which may be caused by Apache processes hanging around for too long consuming up memory but not doing anything. I have edited the Apache conf file and changed KeepAliveTimeout value down from the current setting of 30 to 2 seconds.

I am debating on whether to turn it off altogether and then increase the MaxRequestsPerChild option. This website has some information about KeepAlive and whether you should turn it on or off.

Common WordPress Performance Tuning Tips

There are a number of common tips for performance tuning WordPress which you can read about in detail at other sites but I will quickly cover them here:

1. Install APC or another PHP caching system such as XCache or eAccelerator as these Opcode systems improve performance by saving and re-using compiled PHP which speeds up the execution of server side code.

2. Install a WordPress caching plugin such as WP Super Cache or W3 Total Cache. There is a debate over which one is best and whilst W3 Total Cache does offer more features such as Minification and Browser cache options the main issue that you want to resolve with WordPress is reducing the huge amount of database queries and code that is run on each page load. The aim is to do expensive tasks once and then re-use the results as many times as possible. Caching the results of database queries so that they don't have to be run every time the page loads is a great idea especially if the results hardly change and whilst W3 offers database query result caching as well as caching the output of the generated HTML Super Cache will only cache the generated output.

What is the difference? Well if you cached database query results then during the building of cached files the results of queries that are used to create category lists or tag clouds can be shared across builds rather than being recalculated for every page being cached that uses them. How much difference this makes when you take all MySQL's own internal query caching into consideration is debatable. However both plugins offer the major way to improve fast page loads which is disk based caching of the generated output incorporating GZIP compression.

If you do install W3 Total Cache and you have APC or another PHP Accelerator installed make sure that you enable the Disk Based Cache option for Page Caching and not Opcode which will be default selected if APC or XCache is installed.

3. If bandwidth is a problem then serving up minified and compressed HTML, CSS and JavaScript will help but you don't want to be repeatedly compressing files as they load. Some cache plugins will do this minification on the fly which hurts CPU whereas you really want it done once. There is nothing stopping you combining, compressing and minifying your files by hand. Then you will benefit from small files, fewer HTTP requests and less bandwidth whether or not you make use of a cache plugin.

4. Reduce 404 errors and ensure WordPress doesn't handle them as it will cane performance unnecessarily. Create a static 404 error page or ensure your cache system is setup to handle 404's. Also make sure that common files that cause 404's such as IPhone icons, Crossdomain.xml and favicons exist even if they are empty files.

5. If you're not planning on using a caching system then you should ensure that you tune your .htaccess file manually to ensure that browsers cache your files for specified periods of time rather than downloading them each time they visit your site. You also set your server to serve up compressed gzip files rather than letting a plugin do it for you.

You can do this by setting the future expire headers on your static content such as JS, CSS, images and so on like so:

<FilesMatch "(?i)^.*\.(ico|flv|ogg|swf|jpg|jpeg|png|gif|js|css)$">
ExpiresActive On
ExpiresDefault "access plus 1 weeks"
Header unset Last-Modified
Header set Cache-Control "public, no-transform"
SetOutputFilter DEFLATE
</FilesMatch>


6. Tune your MySQL database by ensuring that your database is set to cache query results and has enough space to do so wisely. Ensure options you don't use or require are disabled and make sure you regularly maintain your tables and indexes by keeping fragmentation to a minimum.

There are a couple of well known tuning scripts which can be used to aid in the setting of your MySQL configuration settings and which use your current database load and settings as a guide to offer recommendations.

http://github.com/rackerhacker/MySQLTuner-perl
http://hackmysql.com/mysqlreport











Uninstall Performance Hogging Plugins

There are lots of plugins available for WordPress and it can be like a case of a kid let lose in a candy shop as there seems to be at least 10 plugins for everything. However having too many plugins installed is definitely a bad thing in terms of performance and unless you know what the code is doing you could be shooting yourself in the foot by installing the next greatest plugin onto your site without thoroughly checking the source code out for yourself first.

The problem is that literally anyone can write and then publish a plugin on WordPress and many of these authors are not programmers by trade or have performance in the forefront of their minds as they develop the code that you might use.

Even plugins that are targeted as performance saving tools are not always beneficial and I have seen plugins that are designed to reduce bandwidth by returning 304 Not Modified headers or 403 Forbidden status codes but have to make numerous database queries, DNS lookups and carry out multiple regular expressions to do so. If Bandwidth is a problem then this might be worth the extra load but if it isn't then you are just swapping a small gain in one area for extra work somewhere else.

If you are going to use a plugin then take a look over the source code to see if you can help improve the performance by adding any missing indexes to any new tables the plugin might have added to your WordPress database. Many plugins do add tables especially if they need to store lots of data and many authors don't include the SQL statements to add appropriate indexes which could end up slowing down lookups down the road as the amount of data within the tables grows.

The following list are extra indexes I have added to tables within the WordPress database for both Plugins I installed and core WordPress tables that were missing indexes for certain queries. Remember WordPress is mainly a READ based system so the extra expense of adding indexes when data is inserted is usually worth it.


Plugin Table IndexName Columns IndexType
- wp_posts status_password_id post_status, post_password, ID Normal
- wp_posts post_date post_Date, ID Unique
fuzzySEOBooster wp_seoqueries_terms term_value_stid term_value, stid unique
fuzzySEOBoosterwp_seoqueries_data stid_pageid_pagetype_founded stid,page_id, page_type,founded unique
WP-O-Matic wp_wpo_campaign_post campaignid_feedid_hash `campaign_id, feed_id, hash Normal
Yet Another
Relatd Post
wp_yarpp_related_cache reference_id reference_ID, ID Normal

Ensure that you reguarly check the MySQL slow query log especially if you have just installed a new plugin as this will help you find queries that need optimising and potential bottlenecks caused by poorly thought out SQL.

On my own site I started off using a well known Related Posts plugin but I found out from the Slow log that the queries it ran to create the lists were killing performance due to their design.

They were taking 9-12 seconds to run and were scanning up to 25 million records at a time as well as carrying out unnecessary UNION statements which doubled the records it needed to look at. I ended up replacing it with a different plugin called LinkWithin which not only looked great due to the images it used but was perfect for performance because it was a JavaScript widget and all the work was carried out on their own server rather than mine.

This might not be the solution for you as obviously JavaScript is disabled by 10% of all visitors and bots won't be able to see the links.

If SEO is a concern, and it should be then you need to make sure that SERP crawlers find all your content easily and having a server side created list of related articles is a good idea for this reason alone. Therefore you can always create your own Related Posts section very easily with a function placed at the bottom of your articles that uses the categories assigned to the post to find other posts with the same category.

The following example shows one way in which this can be done and it makes use of a nice ORDER BY RAND() trick to ensure different articles and categories appear each time the SQL is run. It also uses Wordpresses inbuilt cache to store the results to prevent the query being executed too many times.

<?php
function get_my_related_posts($id, $limit){

// enable access to the WordPress DB object
global $wpdb;

// define SQL
$sql = "SELECT  CONCAT('http://www.mysite.com/',year(p.post_date),'/',RIGHT(concat('0' ,month(p.post_date)),2),'/',post_name,'/') as permalink,
p.post_title as title
FROM (
SELECT p.ID, p.post_name, p.post_title, p.post_date, terms.slug as category
FROM  wp_posts p,  wp_term_relationships tr,  wp_term_taxonomy tt,  wp_terms as terms
WHERE p.ID               != $id                 AND
p.post_type         = 'post'              AND
p.post_status       = 'publish'           AND
p.ID                = tr.object_id        AND
tr.term_taxonomy_id = tt.term_taxonomy_id AND
tt.taxonomy         in ( 'category')      AND
tt.term_id          = terms.term_id
GROUP BY  p.ID, p.post_title, p.post_name, p.post_date
ORDER BY terms.term_id
) as p,
(
SELECT distinct terms.slug
FROM wp_term_relationships tr, wp_term_taxonomy tt, wp_terms as terms
WHERE tr.object_id        = $id     AND
tr.term_taxonomy_id = tt.term_taxonomy_id AND
tt.taxonomy in ( 'category')    AND
tt.term_id          = terms.term_id
ORDER BY RAND() LIMIT 1
) as t
WHERE p.category = t.slug
ORDER BY  RAND()
LIMIT $limit";

// see if we have a cached recordset
$cache_name = "get_my_related_posts_" . $id;

$result = wp_cache_get( $cache_name );
if ( false == $result ) {

// get results and then cache for later use
$result = $wpdb->get_results( $sql );
wp_cache_set( $cache_name, $result );
}

// return result set as object
return $result;
}
?>
<div id="StrictlyRelatedPosts">
<h3>Related posts</h3>
<ul>
<?php
// fetch 5 related posts
$related_posts = get_related_posts($post->ID, 5);
// open loop
foreach ($related_posts as $related_post) {
$permalink = $related_post->permalink;
$title     = $related_post->title;
print "<li><a title=\"$title\" href=\"$permalink\">$title</a></li>\n";
} ?>
</ul>
</div>



Identifying Bottlenecks in Wordpress

One good plugin which I use for identifying potential problematic queries is the Debug Queries plugin which allows administrators to see all the queries that have run on each page. One extra tweak you should add is to put the following line in at the bottom of the get_fbDebugQueries function (around line 98)


$debugQueries .= ' ' . sprintf(__('» Memory Used %s'), $this->ConvertFromBytes($this->GetMemoryUsage(true))) . ' '. "\n";


Then add these two functions underneath that function (around line 106) which get the memory usage and format the value nicely.


// format size from bytes
function ConvertFromBytes($size){

 $unit=array('B','KB','MB','GB','TB','PB');

 return @round($size/pow(1024,($i=floor(log($size,1024)))),2).$unit[$i];
}

// get PHP memory usage
function GetMemoryUsage(){

 if(function_exists("memory_get_peak_usage")) {
  return memory_get_peak_usage(true);
 }elseif(function_exists("memory_get_usage")) {
  return  memory_get_usage(true);
 }
}


This will help you see just how many database queries a standard Wordpress page makes (88 on my homepage!) and if you haven't done any performance tuning then you may suddenly feel the urge before you suffer similar problems to those I experienced.

Remember a high performing site is one which attracts visitors and one which SERP bots are now paying more attention to when indexing. Therefore you should always aim to get the best performance out your system as is feasibly possible and as I have shown that doesn't mean spending a fortune on hardware.




Turning off WordPress features


If you ever look at your sites log file you might see that there is a lot of occurrences of requests to a page called wp-cron.php.

This is a page that handles internal scheduling by WordPress and many plugins hook into this to schedule tasks which is useful for people who don't have access to their webservers control panel as they can still set up "cron" jobs of a sort.

The only difference being that these cron jobs are fired when a page on the site is loaded and if you have a very quiet site a job you may want to run once every 5 minutes won't do if you don't get traffic every minute of the day.


POST /wp-cron.php?doing_wp_cron=1331142791

Sometimes you will even see multiple requests spawned (by your own servers IP) within the same second e.g

123.123.XXX.XX - - [07/Mar/2012:18:03:57 +0000] "POST /wp-cron.php?doing_wp_cron=1331143104 HTTP/1.0" 200 - "-" "WordPress/3.3.1; http://www.mysite.com"
123.123.XXX.XX - - [07/Mar/2012:18:03:57 +0000] "POST /wp-cron.php?doing_wp_cron=1331143109 HTTP/1.0" 200 - "-" "WordPress/3.3.1; http://www.mysite.com"
123.123.XXX.XX - - [07/Mar/2012:18:03:57 +0000] "POST /wp-cron.php?doing_wp_cron=1331143128 HTTP/1.0" 200 - "-" "WordPress/3.3.1; http://www.mysite.com"

To me this seems like overkill.

Yes the wp-cron job is needed to run internal Wordpress tasks such as posting scheduled posts or firing jobs that have been setup to use the internal cron system but having multiple requests fire at the same time seems unneccessary at best.

Why is this bad - well as this blog post about it says boltwebhosting.com says:

Wp-cron.php is called every time a page is loaded. That means if you are getting 50 visitors to your site every hour, and each of them reads 2-3 pages, then wp-cron.php is being called:
50 x 2.5 = 125 times per hour
125 x 24 = 3,000 times per day
3,000 x 30 = 90,000 times per month!
It does not just stop there, because unlike other features in WordPress, the wp-cron.php is spawned as an independent process which can sometimes take several minutes to complete its operations. So an active WordPress site with the traffic volume listed above is spawing 3,000 processes every day which do not really do anything.

Therefore on a very busy site you will be firing this page a lot of times and this may cause severe performance issues on it's own.

The solution is to replace this CRON job with a proper CRON job if possible.

To do this you either need access to your servers control panel or console but don't worry if you don't have access as you can still use a web based service like www.easycron.com.

As many hosting doesn't provide adequate Cron functions for their users this web based method is a great way of automating task without fiddling with your server.

If you do have the ability to setup a CRON task that fires the page once an hour or a time more appropriate to your needs then great. If you don't use the internal cron job for anything then the longer the gap the better but be careful as plugins may use it without your knowledge such as Database Backup plugins or Sitemap generator plugins. I set my CRON job to run the WP-CRON task every 10 minutes and this seems to be fine for my needs.

This is the format to use:

wget -U StrictlyCron -q -O /dev/null http://www.mysite.com/wp-cron.php?doing_wp_cron


You will notice that I am setting the -U parameter (user-agent) to StrictlyCron. This is because I block all blank useragent requests to my site with .htaccess rules (see the security article) and it also helps me identify my own requests in the log file.

Once you have done this you need to edit your sites wp-config.php file which will be in the root of your sites setup and add this line of code to the top of it.


/* disable WP-CRON from running all the time on every page load! */
define('DISABLE_WP_CRON', true);


As the comments state, this is disabling WordPress from firing it's own internal CRON job and as we have replaced it with a real CRON job that will run once an hour rather than on every page load it should reduce our traffic and server load considerably.



Turning Off WordPress HeartBeat

The WordPress HeartBeat functionality was introduced in WP 3.6 to allow interaction between the server and browser using AJAX. However like AutoSave and WP_CRON it can cause a lot of unnecessary HTTP requests as it defaults to 15 seconds a request.

The WordPress Heartbeat API allows WordPress to communicate between the web-browser and the server. It also improves session management, revision tracking, and auto saving. The WordPress Heartbeat API uses /wp-admin/admin-ajax.php, which allows WordPress to keep track of what's going on in the dashboard.

Unfortunately, this can also cause excessive requests to admin-ajax.php, leading to high CPU / Bandwidth usage. Whenever a web-browser is left open on a page using the Heartbeat API, this could potentially be an issue.

I have accidentally left open a post I was editing in a Chrome browser (that always re-opens pages that you had open when you close it) for a week and my bandwidth costs jumped by a good $30.

I scanned my log files and saw /wp-admin/admin-ajax.php being called every 15 seconds for the post page (seen in the Referer section of the log file).

Therefore I shut down the page ASAP and I added the following code to my functions.php file in my theme to only run the code on the post page as it's needed to delete custom fields, show tags and other features that make editing / adding posts easy.

To turn off the HeartBeat functionality go to your themes functions.php file and put the following code at the top of it.

If you don't want to turn it off but just change the timings from 15 seconds to a minute or something else you can but it relies on you editing a core compressed JavaScript WordPress file. You can read about how to do this here.

// stop heartbeat code
add_action( 'init', 'stop_heartbeat', 1 );

function stop_heartbeat() {
 global $pagenow;

        if ( $pagenow != 'post.php' && $pagenow != 'post-new.php' )
 {
  wp_deregister_script('heartbeat');
 }
}


Using WordPress Performance Enhanced Plugins

Now this isn't a sales pitch for my own plugins but you should try and avoid performance hogging plugins and use those with performance features built in.

For instance if your caching plugin has a "purge/delete cache" option then make sure it has a decent wait time inbetween each iteration of the loop otherwise it will consume all your CPU and Memory when you try deleting them. Ask the plugin author after reading their guide.

Also if you are using a Google Site Map on a busy site don't set it to build a whole new site map after every post. The server load may already be high and doing this will sent it higher. Also it is far better to just let the SERP crawlers crawl and find new content anyway.

However if you do want to use a sitemap then use one that lets you set the time the plugin is built at staged intervals through CRON or Web Cron jobs.

My old defunt Strictly Google Sitemap plugin which I no longer support but do use on all my sites because of it's unique features which include:

  • A low number of SQL queries run compared to other sitemap plugins, 
  • Fewer WordPress functions run that other plugins, 
  • The ability for the memory applied to the build process to automatically increment as required. So that you don't get out of memory errors as I used to with other well known plugins.


Even though some features are defunct it is still a great plugin to use for big sites needing sitemaps generated quickly.

With my plugin you can create sitemaps at set times and you can do all the stuff normal sitemap plugins do. The only bits that have stopped working are the SEO parts due to how Twitter, Google, BING and the others all work.

Also with my Strictly TweetBOT PRO plugin that allows you to post Tweets to as many accounts as you want (or the same account multiple times with different content), you might be interested in the delay functionality of the plugin.

It has a delay option where you you can set in seconds how long to wait after sending an HTTP GET request to your new post to get it into the cache before tweeting.

It also has an option to set a delay in seconds before each Tweet is sent out to the account. This allows for enough time for any previous Twitter Rushes to die down before creating a new one.

It also staggers the Tweets out so they don't all look like they are coming from the same place.

Buy Strictly TweetBOT PRO now


WordPress Performance Summary


  • Ensure Apache is configured correctly and don't leave the default values as they are. Make sure MaxClients is set correctly by dividing your RAM by the average Apache process size leaving room for MySQL and anything else you might be running.

  • Tune your MySQL database by configuring correctly and maintaining regularly. Use one of the many free tuning scripts to help set your configuration up correctly but ensure you read up about the various settings and what they do first.


  • Install a Caching plugin that creates hard copies of commonly requested files. Static HTML is fast to load. PHP is costly to compile. Use a PHP accelerator and ensure database query results are cached.


  • Reduce bandwidth by combining, compressing and minifying your CSS, JS and HTML. If your caching plugin doesn't do it once rather than on the fly do it by hand. Remember the key is to do expensive operations once and then re-use the results as many times as possible.


  • Set your .htaccess file up correctly. Ban bad bots to reduce traffic, set far future expiry headers on your static files and use static files to handle 404, 403, 503 errors etc.


  • Reduce the number of plugins and ensure any that you use are not hurting performance. Make sure any tables they use are covered by indexes and use the slow query log to identify problems.



  • Disable WordPress's internal CRON job and replace it with a real CRON job that runs once every hour or 30 minutes rather than on every page load.



  • Disable WordPress HeartBeat functionality or only allow it on post edits to prevent repeated HTTP calls if a page is left open in a browser. You can change the timings from 15 seconds to whatever you want but this means editing a compressed WordPress core JS file. 




Read Part 1 - An Overview
Read Part 3 - Security



Further Reading:






Wednesday, 10 July 2013

Apache Performance Tuning BASH Script

BASH Script to tune Apache Configuration Settings

As you might know a lot of the time I think the LAMP / Wordpress combo is a big bag of shite.

There are so many configuration options, at so many different levels that need tuning to get optimal performance, it is a nightmare to find the right information. There is also too many people offering various solutions for Wordpress / Linux / Apache / MySQL configuration.

Different people recommend different sizes for your config values and just trying to link up server load with page/URL/script requests to find out the cause of any performance issue is a nightmare in itself.

I would have thought there would have been a basic tool out there that could log server load, memory, disk swapping over time and then link that up with the MySQL slow query log, Apache error AND access logs so that you could easily tell when you had issues what processes were running, which URL's were being hit and how much activity was going on to identify culprits for tuning. I have even thought of learning PERL just to write one - not that I want to!

Even with all the MySQL tuning possible, caching plugins installed and memory limits on potentially intensive tasks it can be a nightmare to get the best out of a 1GB RAM, 40GB Virtual Server that is constantly hammered by BOTS, Crawlers and humans. I ban over 50% of my traffic and I still get performance issues at various times of the day - why? I have no FXXING idea!

Without throwing RAM at the problem you can try and set your APACHE values in the config file to appropriate values for your server and MPM fork type.

For older versions of Apache the Multi-Processing Module, non-threaded, pre-forking webserver is well suited as long as the configuration is correct. However it can consume lots of memory if not configured correctly.

For newer versions (2+) the Worker MPM is better as each thread handles a connection at a time and this is considered better for high traffic servers due to the smaller memory footprint. However to get PHP working on this setting apparently needs a lot of configuration and you should read up about this before considering a change.

Read about Apache performance tuning here Apache Performance Tuning.

To find out your current apache version from the console run

apache2 -v OR httpd -v (depending on your server type, if you run top and see apache2 threads then use apache2 otherwise use httpd)

You will get something like this.

Server version: Apache/2.2.9 (Debian) Server built: Feb 5 2012 21:40:20

To find out your current module configuration from the console run

apache2 -V OR httdp -V

Server version: Apache/2.2.9 (Debian)
Server built: Feb 5 2012 21:40:20
Server's Module Magic Number: 20051115:15
Server loaded: APR 1.2.12, APR-Util 1.2.12
Compiled using: APR 1.2.12, APR-Util 1.2.12
Architecture: 64-bit Server
MPM: Prefork threaded: no forked: yes (variable process count)
etc etc etc...

There are lots of people giving "suitable" configuration settings for the various apache settings but one thing you need to do if you run TOP and notice high memory usage and especially high virtual memory usage is try and reduce disk swapping.

I have noticed that when Apache is consuming a lot of memory that your virtual memory (disk based) will be high and you will often experience either high server loads and long wait times for pages to load OR very small server loads e.g 0.01-0.05, an unresponsive website and lots of MySQL Server Gone Away messages in your error log file.

You need to optimise your settings so that disk swapping is minimal which means trying to optimise your MySQL settings using the various MySQL tuning tools I have wrote about as well as working out the right size for your Apache configuration values.

One problem is that if you use up your memory by allowing MySQL to have enough room to cache everything it needs then you can find yourself with little left for Apache. Depending on how much memory each process consumes you can easily find that a sudden spike in concurrent hits uses up all available memory and starts disk swapping.

Therefore apart from MySQL using the disk to carry out OR caching large queries you need to find the right number of clients to allow at any one time. If you allow too many and don't have enough memory to contain them all then the server load will go up, people will wait and the amount of disk swapping will increase and increase until you enter a spiral of doom that only a restart fixes.

It is far better to allow fewer connections and serve them up quickly with a small queue and less waiting than open too many for your server to handle and create a massive queue with no hope of ending.

One of the things you should watch out for is Twitter Rushes caused by automatically tweeting your posts to twitter accounts as this can cause 30-50 BOTS to hit your site at once. If they all consume your memory up then it can cause a problem that I have wrote about before.

Working out your MaxClients value

To work out the correct number of clients to allow you need to do some maths and to help you I have created a little bash script to do this.

What it does is find out the average size of an Apache thread then restarts Apache so that the correct "free size" value can be obtained.

It then divides the remainder by the Apache process size. The value you get should be roughly the right value for your MaxClients.

It will also show you how much disk swapped or virtual memory you are using as well as the size of your MySQL process.

I noticed on my own server that when it was under-performing I was using twice as much disk space as RAM. However when I re-configured my options and gave the system enough RAM to accommodate all the SQL / APACHE processes then it worked fine with low swapping.

Therefore if your virtual memory is greater than the size of your total RAM e.g if you are using 1.5GB of hard disk space as virtual memory and only have 1GB of RAM then it will show an error message.

Also as a number of Apache tuners claim that your MinSpareServers should be 10-25% of your MaxClients value and your MaxSpareServers value 25-50% of your MaxClientsValue I have also included the calculations for these settings as well.


#!/bin/bash
echo "Calculate MaxClients by dividing biggest Apache thread by free memory"
if [ -e /etc/debian_version ]; then
 APACHE="apache2"
elif [ -e /etc/redhat-release ]; then
 APACHE="httpd"
fi
APACHEMEM=$(ps -aylC $APACHE |grep "$APACHE" |awk '{print $8'} |sort -n |tail -n 1)
APACHEMEM=$(expr $APACHEMEM / 1024)
SQLMEM=$(ps -aylC mysqld |grep "mysqld" |awk '{print $8'} |sort -n |tail -n 1)
SQLMEM=$(expr $SQLMEM / 1024)
echo "Stopping $APACHE to calculate the amount of free memory"
/etc/init.d/$APACHE stop &> /dev/null
TOTALFREEMEM=$(free -m |head -n 2 |tail -n 1 |awk '{free=($4); print free}')
TOTALMEM=$(free -m |head -n 2 |tail -n 1 |awk '{total=($2); print total}')
SWAP=$(free -m |head -n 4 |tail -n 1 |awk '{swap=($3); print swap}')
MAXCLIENTS=$(expr $TOTALFREEMEM / $APACHEMEM)
MINSPARESERVERS=$(expr $MAXCLIENTS / 4)
MAXSPARESERVERS=$(expr $MAXCLIENTS / 2)
echo "Starting $APACHE again"
/etc/init.d/$APACHE start &> /dev/null
echo "Total memory $TOTALMEM"
echo "Free memory $TOTALFREEMEM"
echo "Amount of virtual memory being used $SWAP"
echo "Largest Apache Thread size $APACHEMEM"
echo "Amount of memory taking up by MySQL $SQLMEM"
if [[ SWAP > TOTALMEM ]]; then
      ERR="Virtual memory is too high"
else
      ERR="Virtual memory is ok"
fi
echo "$ERR"
echo "Total Free Memory $TOTALFREEMEM"
echo "MaxClients should be around $MAXCLIENTS"
echo "MinSpareServers should be around $MINSPARESERVERS"
echo "MaxSpareServers should be around $MAXSPARESERVERS"


If you get 0 for either of the last two values then consider increasing your memory or working out what is causing your memory issues. Either that or set your MinSpareServers to 2 and MaxSpareServers to 4.

There are many other settings which you can find appropriate values for but adding indexes to your database tables and ensuring your database table/query caches can fit in memory rather than swapped to disk is a good way to improve performance without having to resort to more caching at all the various levels Wordpress/Apache/Linux users love doing.

If you do use a caching plugin for Wordpress then I would recommend tuning it so that it doesn't cause you problems.

At first I thought WP SuperCache was a solution and pre-caching all my files would speed things up due to static HTML being served quicker than PHP.

However I found that the pre-cache stalled often, caused lots of background queries to rebuild the files which consumed memory and also took up lots of disk space.

If you are going to pre-cache everything then hold the files as long as possible as if they don't change there seems little point in deleting and rebuilding them every hour or so and using up SQL/IO etc.

I have also turned off gzip compression in the plugin and enabled it at Apache level. It seems pointless doing it twice and PHP will use more resources than the server.

The only settings I have enabled in WP-Super-Cache at the moment are:


  • Don’t cache pages with GET parameters. (?x=y at the end of a url) 
  • Cache rebuild.
  • Serve a supercache file to anonymous users while a new file is being generated. 
  • Extra homepage checks. (Very occasionally stops homepage caching)
  • Only refresh current page when comments made. 
  • Cache Timeout is set to 100000 seconds (why rebuild constantly?)
  • Pre-Load - disabled.

Also in the Rejected User Agents box I have left it blank as I see no reason NOT to let BOTS like googlebot create cached pages for other people to use. As bots will most likely be your biggest visitor it seems odd to not let these BOTS create cached files.

So far this has given me some extra performance.

Hopefully the tuning I have done tonight will help the issue I am getting of very low server loads, MySQL gone away errors and high disk swapping. I will have to wait and see!

Saturday, 3 March 2012

The Wordpress Survival Guide Part 3 - Security

This is the 3rd part of the Wordpress Survival Guide which looks at security measures.

The other two guides which cover basics for people new to Linux, Apache and Wordpress and Performance can be found here:

The Wordpress Survival Guide Part 1 - Linux, Apache and Wordpress
The Wordpress Survival Guide Part 2 - Performance

If you have an under powered or busy server then security and performance go hand in hand as reducing the amount of traffic from bad bots, hackbots, spammers, login hackers, heavy hitters and so on will also help reduce the load on your server.

There are many plugins out there which claim to help the security on Wordpress but you should be careful as from my own investigation of the code many of these plugins whilst protecting you from potential threats can reduce your sites performance as they carry out too many checks on submitted fields.

If a plugin is checking every form element submitted to the server for hundreds of known SQL injection or XSS hacks with regular expressions or string checks then this can slow down a page load incredibly.

Therefore the further up the chain you can push your security checks from PHP code running in Wordpress to the actual web server the better.

The aim is to move as much blocking code away from your site to your server so we want to make use of our firewall and our .htaccess file by adding a number of rules designed to identify and block potential hackers and spammers before they get to your site and any plugin code.

Blocking with our LINUX Firewall

Once you have found persistent offenders from the methods listed below the aim is to remove any CPU  and Memory from being wasted on them by WordPress and your .htaccess file and put them into your LINUX Firewall.

You can install a plugin to your server called Fail2Ban which will actually analyse your log files for you looking for spammers, hackers and bandwidth wasters and add them automatically to your IPTables (Network Firewall).

However you should read up on it carefully and configure it correctly so that you don't end up blocking yourself sending emails into WordPress or other actions.

The higher up you can block the bad traffic the better. Therefore read this article on how you can block bad BOTS and users by the WebMin interface.

Blocking with the .htaccess file

The .htaccess file sits in your websites root folder and contains rules local to the site which can allow or deny users to your site by blocking certain requests either by IP address, user-agent, or the type of request the user is making. 

I used to return a 403 forbidden status code to the people I wanted to block but I am now trying out a different format which seems to have increased performance. I suspect this might be down to the users of malicious bots seeing a 403 Forbidden code as a "challenge" to crack rather than a sign they should go away therefore I have replaced returning 403 with a 404 code.

As there doesn't seem to be a quick flag like [F] for 403 to use you should create your own 404 page which should contain very basic HTML and no references to any Wordpress include files or other code that could be loaded in.

At the top of you page you put some PHP to return the 404 status code. An example is below.


<?php
header("Status: 404 Not Found");
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US">
<head profile="http://gmpg.org/xfn/11">
<title>My Website</title>
<link rel="canonical" href="http://www.mywebsite.com/" />
</head>
<body id="home">
 <div id="page-wrap">
  <div id="header">  
   <p id="slogan">This is my website</p>
  </div>
 </div>
 <div id="body">
  <h1>My Website</h1>

  <p>Sorry that page doesn't exist</p>
 </div>
</body>
</html>

The idea is to have a quick loading page that returns a basic response rather than a blank one so that crawlers think they have just made a mistake and that the URL they are targeting doesn't exist. A blank response or a forbidden status could signify to them that you have caught them out. The PHP at the top ensures a 404 status code is returned.

Once you have created the custom 404.php page and put some basic text in it upload it to the root of your website.

Now you can edit your .htaccess file and change some of the main checks we are going to do so that they redirect the bot to the custom 404.php page and not the Wordpress 404 page.

We don't want to get ourselves in a big loop of circular redirects which is why we check for the 404.php page on our 2nd block of rules.

The first set of rules block common SQL injection attacks, common XSS hacks which include passing JavaScript in the querystring, known file lookups as well as calls to certain applications which should never be accessible from the webserver but sometimes are.

The second block is aimed at known bad bot user-agents, common HTTP libraries such as CURL, WGet, Snoopy and other libraries which are usually downloaded by Script Kiddies and used without any modification.

A proper hacker or spammer will mask themselves a lot better than this but these rules will stop the wannabes and baby nobs that have no clue what they are doing but still overload your server.

I also then block blank and very short user-agents or jbberish user-agents as I believe if the user cannot tell me who they are then I don't want them on my site. It is up to you whether you decide you want people masking themselves in this way to access your servers. You will notice that on this section I still use the [F] forbidden flag and return a 403 code.

The last block are known email harvesters and spammers which I redirect off to a honeypot to be logged and blocked by a proper tool designed to catch out email harvesting bots.

I have found that a good set of rules can reduce traffic to a server by over 50% which is obviously a major performance benefit and since I have changed my first two sets of rules from returning 403 to 404 codes the response time of my server and sites upon it has increased.

<IfModule mod_rewrite.c>
Options +FollowSymlinks
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{QUERY_STRING} (%3C|<)/?script(%3E|>)    [NC,OR]
RewriteCond %{QUERY_STRING} (eval\(|document\.|\.cookie|createElement)    [NC,OR]
RewriteCond %{QUERY_STRING} DECLARE[^a-z]+\@\w+[^a-z]+N?VARCHAR\((?:\d{1,4}|max)\)    [NC,OR]
RewriteCond %{QUERY_STRING} ^/+\?.*sys.?(?:objects|columns|tables|[xs]p_|exec)    [NC,OR]
RewriteCond %{REQUEST_URI} ^\/\/?(owssvr|strmver|Auth_data|redirect\.adp|MSOffice|DCShop|msadc|winnt|system32|script|autoexec|formmail\.pl|_mem_bin|NULL\.) [NC,OR]
RewriteCond %{REQUEST_URI} ^\/\/?(php\-?my\-?admin\-?\d?|P\/?M\/?A(\d+)?|(db|web)?(admin|db|sql)|(my)?sql\-?(admin|manager|web)?)/? [NC]
RewriteRule ^.*$ /404.php [R=301,L]


RewriteCond %{REQUEST_FILENAME} !/404\.php # ensure we are not already on our 404.php page
RewriteCond %{HTTP_USER_AGENT} (?:ColdFusion|Jakarta|HTTPClient|Java|libwww\-perl|Nutch|PycURL|Python|Snoopy|urllib) [NC,OR] # common HTTP libraries
RewriteCond %{HTTP_USER_AGENT} (?:LWP|PECL|POE|PycURL|WinHttp|curl|Wget) [OR] # case sensitive HTTP libraries
RewriteCond %{HTTP_USER_AGENT} (?:ati2qs|cz32ts|EventMachine|indy|linkcheck|Morfeus|NV32ts|Pangolin|Paros|ripper|scanner|offline) [NC,OR] # known rippers
RewriteCond %{HTTP_USER_AGENT} (?:AcoiRobot|alligator|auto|bandit|boardreader|BCD2000|blackwidow|capture|ChinaClaw|collector|copier|disco|devil|downloader|fetch|flickbot|grabber|gosospider|Gentoo|HTMLParser|hapax|hook|igetter|jetcar|JS-Kit|kame-rt|kmbot|KKman|leach|majestic|MetaURI|mole|miner|mirror|mxbot|rogerbot|race|reaper|sauger|speedy|Sogou|sucker|snake|spinn3r|Sosospider|stripper|UnwindFetchor|vampire|whacker|xenu|zeus|zip) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (?:AhrefsBot|fairshare|proxy|PageGetter|magpie|Zemanta|baidu|MiniRedir|SurveyBot|PMAFind|SolomonoBot|whitehat|blackhat|MSIE\s6\.0|ZmEu) [NC]
RewriteRule ^.*$ /404.php [R=301,L]

# Block blank or very short user-agents. If they cannot be bothered to tell me who they are or provide jibberish then they are not welcome!                                          
RewriteCond %{HTTP_USER_AGENT} ^(?:-?|[a-z1-9\-\_]{1,10})$ [NC]
RewriteRule .* - [F,L]

# fake referrers and known email harvesters which I send off to a honeytrap full of fake emails
RewriteCond %{HTTP_USER_AGENT} (?:atomic|collect|e?mail|magnet|reaper|siphon|sweeper|harvest|(?:microsoft\surl\scontrol)|wolf) [NC,OR] # spambots, email harvesters
RewriteCond %{HTTP_REFERER} ^[^?]*(?:iaea|\.ideography|addresses)(?:(\.co\.uk)|\.org\|.com) [NC]
RewriteRule ^.*$ http://english-61925045732.spampoison.com [R,L] 

</IfModule>


You should check your websites access and error logs regularly to see who has been getting banned or 404'd a lot by these rules and then you can decide whether to block their IP address by adding it your .htaccess file like so.



order allow,deny
deny from 81.24.210.2 # example IP - not known to be bad

These rules deny all requests from a particular IP address.

OR adding it to your Firewall if it's making a large amount of calls to your site.

One trick I do like, which I have to thank a commenter, RobinInTexas for is this rule which sends the BOT back to the IP address they came from in the first place. However there are two changes to the rule he sent which used

http://%{REMOTE_ADDR} [L,R=301]

And that is to return them to their localhost address 127.0.0.1 NOT the IP they came from as many people will be going through gateways such as people at work, phone or tablet users and people on WIFI systems in shops etc.

You don't want the ISP that this gateway belongs to thinking you are sending it lots of hackers and bad bots as you might get your site blocked for sending so much traffic to it.

The other change is to make it a 302 temporary redirect instead of a 301 as it is the correct status code to use. So instead of the rule above use this.

http://127.0.0.1 [L,R=302]

You could decide to send them to a honeypot website that logs them as a bad BOT so other users know about them or even some sites that are designed to keep them crawling for days wasting time going through links that lead nowhere but to other links that lead nowhere etc.

The rule in action would look like this:


RewriteCond %{HTTP_USER_AGENT} (?:Spider|MJ12bot|seomax|atomic|collect|e?mail|magnet|reaper|tools\.ua\.random|siphon|sweeper|harvest|(?:microsoft\surl\scontrol)|wolf) [NC]
RewriteRule .* http://127.0.0.1 [L,R=302]




2. Blocking SSH Attacks with DenyHosts


Install DenyHosts if you havent already which will block attacks by SSH to your server. It is amazing how many people I have blocked since installing this application and people are always on the look out for new webservers on known cloud hosting IP ranges like Rackspace to attack and hopefully compromise.

To install this you open an SSH connection (with Putty) and run the following commands.

apt-get install denyhosts

to view by console go to the directory the application is installed to.

cd /var/log/denyhosts
tail -f /var/logs/denyhosts

This will show you the tail end of the DenyHosts log file and any newly added IP addresses.

Make sure to add any IP addresses that you access your server console by SSH to the Allow Hosts which you can do by the terminal in VI or from WebMin by going to:

Webmin > Networking > TCP Wrappers > AllowedHosts > Add New Rule

Fill out the form like so:

Services: ALL
Remote hosts: Tick the second radio button and add the IP address to the text input
Shell Commands: none
Save the form.


3. Stop Being Open To SSH / BASH Hack Attacks

Also to stop your server being vulnerable to hacks like the Shellshock hack which appeared recently and exposed nearly every LINUX machine due to their use of SSH and BASH you should do the following.

Test if you are vulnerable by running this command.

env x='() { :;}; echo vulnerable' dash -c "echo this is a test"

If you are then these are somethings you can do.

Turn off BASH and install DASH an older version.

If you are using DASH and want to run BASH just type in BASH to get there.

Also replace the default shell for root and any other users to another folder with symbolic links. Look up on the web how to do this.

Disable any cgi-bin commands in all Apache config files as this is what the hack relies on e.g

#ScriptAlias /cgi-bin/ /home/searchmysite/cgi-bin/
 
#
#allow from all
#

Remove AW stats and Webalizer for all virtual min sites. These rely on CGI-BIN as well.

Regularly change all your user passwords and especially your root password.


A good technique for a strong password is to thing of a common sentence or phrase you will remember and mix the characters up and add a number on the end only you would remember (not your Birthday!) e.g a football teams last trophy win or the year of your last holiday.

Add some dashed or underscores in as well to make it even harder for password crackers to crack it with dictionary attacks. An example would be.

hOWnOWbROWNcOW__1995**

Regularly check your users table for any that look out of place e.g inserted by a hacker.

Also regularly check your home and temp folders for any files that shouldn't be there. One hack I saw replaced the default SSH config file with a temp file in /tmp/sh that loaded up (using WGET) a file hidden in a website that then ran more WGET commands to load in a library of hacks for DDOS and SSH etc and then ran the commands he wanted.

With a compromised server and a SSH config file that had been overwritten he could then use your server to run hack attacks on other machines.

If this is happening, quickly get the IP of the site he is loading the files from and block incoming and outgoing TCP requests in your firewall. Then get a default SSH config file and replace the hacked version before changing all your passwords and ensuring BASH isn't available to be used in a hack.

You can check if anyone who shouldn't be logged into your machine is with the ps ax command.



4. Using Wordpress Plugins to block dangerous traffic.

Two plugins I have found quite useful so far for reducing hack attacks are these:


The Limit Login Attempts plugin which blocks brute force attacks on the wp-login page. If you don't want people signing up to your site anyway you should use a plugin to obfusicate this page anyway otherwise just limit the number of failed attempts so that dictionary attacks are prevented.

http://wordpress.org/extend/plugins/limit-login-attempts/

Use the IP addresses this plugin collects and take the worst offenders and put them in your DENY HOSTS table as well as considering banning them with your LINUX firewall. Read this article for more information on banning bad BOTS and blocking hackers and scrapers.

Install the Wordpress Firwewall plugin to block certain hack attempts and be notified by email when attacks occur. Make sure to add any IP address you access your website to the whitelist so you don't get blocked out.

This plugin will look for some of the same tricks our .htaccess file rules are aimed at blocking as well as some different types of attack that are used when form parameters are filled with dangerous values and submitted to the server.

http://wordpress.org/extend/plugins/wordpress-firewall-2/

There are other things you can do as well but these 3 tips are a good starting point. I will update this page as and when new features are proven at increasing security without effecting site performance.


4. Using other tools on your server to add rules to DenyHosts and your Firewall

There is a tool you can use on LINUX machines called Fail2Ban which RackSpace and other cloud hosters actually recommend using. It will constantly analyse your access and error logs and add IP addresses which it things are suspicious into your DenyHosts list and your Firewall IP Table.

However be-warned I used it myself and tried some of the email rules. I then found myself having my IP being blocked and emails sent from my own computer to my server blocked.

I then tried removing these from the configuration and still ran into problems (not immediately - so it may not have been Fail2Ban's problem) of emails sent from my PC to my WordPress site where a plugin called Postie put them into the system as articles stopped working.

In the end I had to remove the Fail2Ban program from my server. However if you are not doing anything like I am or can configure it properly (I may have made a mistake) then it could be the tool for you as it will save time adding rules into DenyHosts and your IP TABLE for your firewall to use.


Read Part 1 - An Overview
Read Part 2 - Performance