Wednesday, 12 November 2014

Twitter Rush caused by Tweet BOTs visiting your site

Twitter Traffic can cause a major hit on your server

If you are using Twitter to post tweets to whenever you blog or post an article you should know that a large number of BOTS will immediately hit your site as soon as they see the link.

This is what I call a Twitter Rush as you are causing a rush of traffic from posting a link on Twitter.

I did a test some months back and I like to regularly test how many hits I get whenever I post a link so that I can weed out the chaff from the wheat and set up rules to ban any BOTS I think are just wasting me money.

Most of these BOTS are also stupid.

If you post the same link to multiple Twitter accounts e.g by using my Wordpress plugin - Strictly Tweetbot then the same BOT will come to the same page multiple times.

Why? Who wrote such crap code and why don't they check before hitting a site that they haven't just crawled that link. I cannot believe the developers at Yahoo cannot write a BOT that works out they have just crawled a page before doing it two more times.

Some BOTS are obviously needed such as the major SERP search engines e.g Google or Bing but many are start up "social media" search engines and other such content scrapers, spammers. hackbots and bandwidth wasters.

Because of this I now 403 a lot of these BOTS or even send them back to the IP address they came from with an ISAPI rewrite rule as they don't provide me with any benefit and just steal bandwidth and cost me money.

RewriteCond %{HTTP_USER_AGENT} (?:Spider|MJ12bot|seomax|atomic|collect|e?mail|magnet|reaper|tools\.ua\.random|siphon|sweeper|harvest|(?:microsoft\surl\scontrol)|wolf) [NC]
RewriteRule .* http://%{REMOTE_ADDR} [L,R=301]

However if you are using my Strictly Tweet BOT plugin which can post multiple tweets to the same or multiple accounts then the new version allows you to pre-post a page which hopefully gets cached by the caching plugin you should be using (WP SUPER CACHE or W3 TOTAL CACHE etc) before the article is made public and the BOT looking at Twitter for URL's to scrape can get to it.

The aim is to get the page cached BEFORE multiple occurrences of BOTS hit the page. If the page is already cached then the load on your server should be a lot less than if every BOT's loading of the page was trying to cache the page at the same time (due to the quickness of their visit). 

However if you are auto blogging and using my TweetBOT you might be interested in Strictly TweetBOT PRO as it had extra features for people who are tweeting to multiple accounts or multiple tweet in different formats to the same account. These new features are all designed to reduce the hit from a Twitter Rush.

The paid for version allows you to do the following:

  • Make an HTTP request to the new post before Tweeting anything. If you have a caching plugin on your site then this should put the new post into the cache so that when the Twitter Rush comes they all hit a cached page and not a dynamically created one.
  • Add a query-string to the URL of the new post when making an HTTP request to aid caching. Some plugins like WP Super Cache allow you to force an uncached page to be loaded with a querystring. So this will enable the new page to be loaded and re-cached.
  • Delay tweeting for N seconds after making the HTTP request to cache your post. This will help you ensure that the post is in the cache before the Twitter Rush.
  • Add a delay between each Tweet that is sent out. If you are tweeting to multiple accounts you will cause multiple Twitter Rushes. Therefore staggering the hits aids performance.


Buy Now


I did a test this morning to see how much traffic was generated by a test post. I got almost 50 responses within 2 seconds!

50.18.132.28 - - [30/Nov/2011:07:04:41 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 471 "-" "bitlybot"
50.57.137.74 - - [30/Nov/2011:07:04:43 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "EventMachine HttpClient"
50.57.137.74 - - [30/Nov/2011:07:04:43 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "EventMachine HttpClient"
184.72.47.46 - - [30/Nov/2011:07:04:43 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "UnwindFetchor/1.0 (+http://www.gnip.com/)"
204.236.150.14 - - [30/Nov/2011:07:04:44 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 403 471 "-" "JS-Kit URL Resolver, http://js-kit.com/"
50.18.121.55 - - [30/Nov/2011:07:04:45 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "UnwindFetchor/1.0 (+http://www.gnip.com/)"
184.72.47.71 - - [30/Nov/2011:07:04:47 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "UnwindFetchor/1.0 (+http://www.gnip.com/)"
50.18.121.55 - - [30/Nov/2011:07:04:48 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "UnwindFetchor/1.0 (+http://www.gnip.com/)"
184.72.47.71 - - [30/Nov/2011:07:05:11 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "UnwindFetchor/1.0 (+http://www.gnip.com/)"
199.59.149.31 - - [30/Nov/2011:07:04:43 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "Twitterbot/0.1"
107.20.160.159 - - [30/Nov/2011:07:04:43 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "http://unshort.me/about.html"
46.20.47.43 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 403 369 "-" "Mozilla/5.0 (compatible"
199.59.149.165 - - [30/Nov/2011:07:04:43 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28862 "-" "Twitterbot/1.0"
173.192.79.101 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 403 471 "-" "-"
50.18.121.55 - - [30/Nov/2011:07:05:11 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "UnwindFetchor/1.0 (+http://www.gnip.com/)"
46.20.47.43 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 403 369 "-" "Mozilla/5.0 (compatible"
199.59.149.31 - - [30/Nov/2011:07:05:11 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "Twitterbot/0.1"
65.52.0.229 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28863 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
66.228.54.132 - - [30/Nov/2011:07:04:43 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 106183 "-" "InAGist URL Resolver (http://inagist.com)"
199.59.149.165 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28863 "-" "Twitterbot/1.0"
107.20.42.241 - - [30/Nov/2011:07:05:11 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "PostRank/2.0 (postrank.com)"
65.52.0.229 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28862 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
65.52.0.229 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28862 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
110.169.128.180 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28862 "http://twitter.com/" "User-Agent:Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-us) AppleWebKit/533.18.1"
74.112.131.128 - - [30/Nov/2011:07:05:15 +0000] "GET /some-test-url-I-posted/ HTTP/1.0" 200 106203 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8"
74.112.131.131 - - [30/Nov/2011:07:05:16 +0000] "GET /some-test-url-I-posted/ HTTP/1.0" 200 106203 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8"
74.112.131.127 - - [30/Nov/2011:07:05:16 +0000] "GET /some-test-url-I-posted/ HTTP/1.0" 200 106203 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8"
74.112.131.128 - - [30/Nov/2011:07:05:16 +0000] "GET /some-test-url-I-posted/ HTTP/1.0" 200 106203 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8"
74.112.131.131 - - [30/Nov/2011:07:05:17 +0000] "GET /some-test-url-I-posted/ HTTP/1.0" 200 106203 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8"
74.112.131.128 - - [30/Nov/2011:07:05:17 +0000] "GET /some-test-url-I-posted/ HTTP/1.0" 200 106203 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8"
107.20.78.114 - - [30/Nov/2011:07:05:18 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
65.52.54.253 - - [30/Nov/2011:07:05:19 +0000] "GET /2011/11/hypocrisy-rush-drug-test-welfare-benefit-recipients HTTP/1.1" 403 470 "-" "-"
107.20.78.114 - - [30/Nov/2011:07:05:28 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
65.52.62.87 - - [30/Nov/2011:07:05:30 +0000] "GET /2011/11/hypocrisy-rush-drug-test-welfare-benefit-recipients HTTP/1.1" 403 470 "-" "-"
107.20.78.114 - - [30/Nov/2011:07:05:43 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
107.20.78.114 - - [30/Nov/2011:07:06:15 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
199.59.149.165 - - [30/Nov/2011:07:04:45 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28860 "-" "Twitterbot/1.0"
107.20.160.159 - - [30/Nov/2011:07:04:59 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "http://unshort.me/about.html"
107.20.78.114 - - [30/Nov/2011:07:06:15 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
199.59.149.31 - - [30/Nov/2011:07:04:46 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "Twitterbot/0.1"
107.20.42.241 - - [30/Nov/2011:07:05:01 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "PostRank/2.0 (postrank.com)"
107.20.42.241 - - [30/Nov/2011:07:05:07 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "PostRank/2.0 (postrank.com)"
107.20.78.114 - - [30/Nov/2011:07:06:15 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
107.20.78.114 - - [30/Nov/2011:07:06:17 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
107.20.78.114 - - [30/Nov/2011:07:06:17 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
50.16.51.20 - - [30/Nov/2011:07:06:21 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "Summify (Summify/1.0.1; +http://summify.com)"
74.97.60.113 - - [30/Nov/2011:07:07:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28889 "-" "Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20100101 Firefox/8.0"



Buy Now

The Wordpress Survival Guide - Part 2 - Performance

Surviving Wordpress - Performance and Site Optimization

UPDATED - 12th Nov 2014

I have updated this to include a way to handle MySQL errors, a BASH script to tune Apache and an improved function to check your servers load and handle Windows errors. Plus code to disable the new WordPress HeartBeat functionality which can be a CPU / Bandwidth killer.

This is the second part of my guide to surviving WordPress and as promised it looks at performance tweaks and tips which I have gathered on my way.

It has been quite a while since the first installment and the main reason for this was that I was suffering my own performance killer which I wanted to solve first before writing this article. Luckily this has now been solved with the help of Robert from Tiger Tech blog who helped me get to the bottom of the issue so here it is.

My own personal journey into WordPress performance tuning started off when I started to experience out of PHP memory errors when manually rebuilding my Google sitemap.

I started to play around with different plugins and then delve into the code which is when I started to realise the danger that WordPress plugins can carry out to a site when the user doesn't realise what's going on behind the scenes.

You can check out a detailed examination here but in my case it was using a Google Sitemap plugin that was set to rebuild when a new post was saved. Combining that with WP-O-Matic which imports articles at scheduled intervals and a TwitterBot such as my own which can send Tweets to multiple accounts whenever new content is added all added up to a performance killer!

If you have a similar setup it's worth running TOP, MyTOP and checking your access logs to see how it affects your own system but what was happening on my own setup was:

  • WP-O-Matic starts to import a feeds worth of articles (max of 10) for each article that is saved.
  • Numerous procedures hooked into the SavePost or PublishPost action run. In my case it was:
  1. My Strictly AutoTags plugin runs which analyses the article and adds relevant tags, depending on the admin settings, the number of tags and the length of the article this could be quick or slow.
  2. The Google Sitemap plugin then ran which runs a lot of SQL queries and creates a new file as well as pinging multiple SERPs with HTTP requests.
  3. My Strictly Tweetbot Plugin also runs which posts a tweet to multiple accounts. This caused a Twitter Rush as 50+ BOTS all hammered my site at the same time due to the new link appearing on Twitter. 
  4. Any other plugin using the Save hooks runs such as caching tools which create static files.
  • As soon as the Tweets arrive on Twitter a multitude of Bots, 50 on my last test, will visit the site to index the link that has just been posted OR try and scrape, hack or insert spam comments into the post.
  • If the link was posted to multiple accounts you will find that the same bots will visit for each account you posted to. Some bots like Yahoo seem to be particularly bad and visit the article multiple times anyway. So if I posted to 5 twitter accounts that's 250 visits in the space of a few seconds from BOTS scanning for new tweet links to visit!
  • All these visits create new Apache processes and depending on the amount of memory that each Apache process uses you could find that your server starts swapping memory to disk to handle the increase and in my case my server load would quickly jump from 0.15 to 50+.

The more articles you import the more iterations of this chain of performance killing events occurs. I found that these events would sometimes pass off without any noticeable problems but other times the server load would get so high that I would have to reboot my machine.

The highest value I recorded was 174 on a 1GB RAM Linux server!

In fact on some days I would have to reboot 3-5 times which is not good at all.

Getting to the bottom of the problem

A common solution to any performance related problem is to throw more resources at it. Many message boards recommended increasing the maximum memory limit to get round the Out of Memory errors the Google Sitemap was throwing up but that just masks the issue and doesn't actually solve it.

As a by product of my system tuning I ended up creating my own Google Sitemap Plugin to overcome limitations of the others.

Not only could it be easily set to rebuild at scheduled intervals instead of only when new posts were added which helps reduce unnecessary rebuilds, but it used far less memory and made a tiny number of database queries in comparison to the other market leaders.

I also created a System Reporting plugin so that I could be kept informed when my site was playing up and I found this invaluable in keeping my site running during this performance nightmare. If you are not on your site 24/7 and cannot afford professional monitoring services it is great to get an email telling you if you site is: down, taking ages to respond, has a very high server load or running too many SQL queries.

One of the first ideas to reduce the amount of times I was rebooting was to try and prevent any performance intensive tasks from running if the server load was already high.

I did this by adding in some checks to all my major plugins that made a call to the following function before running anything. If the load was above 1.0 I just exited immediately. You can read more about this method in this article: Testing Server Load.

function GetServerLoad(){

 $os = strtolower(PHP_OS); 
 
 // handle non windows machines
 if(substr(PHP_OS, 0, 3) !== 'WIN'){
  if(file_exists("/proc/loadavg")) {    
   $load = file_get_contents("/proc/loadavg"); 
   $load = explode(' ', $load);     
   return $load[0]; 
  }elseif(function_exists("shell_exec")) {     
   $load = @shell_exec("uptime");
   $load = explode(' ', $load);        
   return $load[count($load)-3]; 
  }else { 
   return false; 
  } 
 // handle windows servers
 }else{ 
  if(class_exists("COM")) {     
   $wmi  = new COM("WinMgmts:\\\\."); 
   if(is_object($wmi)){
    $cpus  = $wmi->InstancesOf("Win32_Processor"); 
    $cpuload = 0; 
    $i   = 0;   
    // Old PHP
    if(version_compare('4.50.0', PHP_VERSION) == 1) { 
     // PHP 4      
     while ($cpu = $cpus->Next()) { 
      $cpuload += $cpu->LoadPercentage; 
      $i++; 
     } 
    } else { 
     // PHP 5      
     foreach($cpus as $cpu) { 
      $cpuload += $cpu->LoadPercentage; 
      $i++; 
     } 
    } 
    $cpuload = round($cpuload / $i, 2); 
    return "$cpuload%"; 
   }
  } 
  return false;     
 } 
}


Apache Configuration

I finally got to the bottom of the problem I was suffering with the help of Tiger Tech after examining the output of ps auxwwH --sort rss during a period of high load. This listed all the currently running processes ordered by the amount of memory they were consuming.

At the time of running this my average load was 50 which meant there was a big queue of processes waiting to be run which included over 70 Apache processes each using between 8MB and 30MB and this alone was easily using up my 1GB of RAM.

This high number of Apache processes meant that my server was busily swapping from real memory to disk based virtual memory which was causing high I/O (clearly seen from the output of iostat) and slowing down the response times of each Apache process.

As each process got slower to respond new processes were spawned using up even more virtual memory adding to the problem. This spiral of death was only resolved if for some reason the traffic suddenly screeched to a halt (not likely during an article import that delivers hundreds of bots from Twitter on top of normal traffic) OR I killed Apache or the server.

The solution to this problem was to reduce the number of simultaneous Apache processes that could be run at one time by reducing the MaxClients setting in the Apache config file.

My existing setting of 256 was far too high for my 1GB RAM server. The way to calculate a more appropriate setting is to take the average size of an Apache process and then divide the total available memory by that number leaving room for other processes such as MySQL. In my case I was advised to set MaxClients to a value of 20 which seems small in comparison to the original value but makes more sense when you do the maths.

I have actually created a BASH script which you can run on your own server which will test the available space, average Apache process size, and then calculate the values for your MaxClients, MinSpareServers and MaxSpareServers which you can read here: BASH MaxClients Tuning Script.

Reducing my MaxClients setting to a much smaller value meant that the memory allocation for my system would never reach such unmanageable amounts again. If my server is swamped by traffic then instead of 256 Apache processes being spawned all trying to claim 20MB or more for themselves they will be queued up in an orderly fashion.

It might slow down some requests as they wait to be dealt with but that is far better than the whole server freezing up which was occurring regularly.

Two other settings I changed in the Apache conf file was the Timeout value down from 300 to 30 and HostnameLookups was turned off. You can read more about these settings at the Apache performance tuning site.

Another recent issue I have just had was eerily the opposite of the above. I would get periods of very low server load (0.00 - 0.02) and there would be no Apache or MySQL processes running. The websites couldn't be accessed and only a restart of Apache would fix it.

At first I was checking the Apache error logs and seeing lots of "MySQL Server has gone away" errors. I found that this was a common issue in WordPress and created a custom wp-db.php file which would re-connect to the server if a query ran and met that error. You can read more about that script here: Fixing the MySQL Server Has Gone Away Error.

However this just got rid of the error messages it didn't really fix any problems.

After a lot of reading and tuning I eventually found what "seems" to be a fix for this issue which may be caused by Apache processes hanging around for too long consuming up memory but not doing anything. I have edited the Apache conf file and changed KeepAliveTimeout value down from the current setting of 30 to 2 seconds.

I am debating on whether to turn it off altogether and then increase the MaxRequestsPerChild option. This website has some information about KeepAlive and whether you should turn it on or off.

Common Wordpress Performance Tuning Tips

There are a number of common tips for performance tuning Wordpress which you can read about in detail at other sites but I will quickly cover them here:

1. Install APC or another PHP caching system such as XCache or eAccelerator as these Opcode systems improve performance by saving and re-using compiled PHP which speeds up the execution of server side code.

2. Install a Wordpress caching plugin such as WP Super Cache or W3 Total Cache. There is a debate over which one is best and whilst W3 Total Cache does offer more features such as Minification and Browser cache options the main issue that you want to resolve with Wordpress is reducing the huge amount of database queries and code that is run on each page load. The aim is to do expensive tasks once and then re-use the results as many times as possible. Caching the results of database queries so that they don't have to be run every time the page loads is a great idea especially if the results hardly change and whilst W3 offers database query result caching as well as caching the output of the generated HTML Super Cache will only cache the generated output.

What is the difference? Well if you cached database query results then during the building of cached files the results of queries that are used to create category lists or tag clouds can be shared across builds rather than being recalculated for every page being cached that uses them. How much difference this makes when you take all MySQL's own internal query caching into consideration is debatable. However both plugins offer the major way to improve fast page loads which is disk based caching of the generated output incorporating GZIP compression.

If you do install W3 Total Cache and you have APC or another PHP Accelerator installed make sure that you enable the Disk Based Cache option for Page Caching and not Opcode which will be default selected if APC or XCache is installed.

3. If bandwidth is a problem then serving up minified and compressed HTML, CSS and Javascript will help but you don't want to be repeatedly compressing files as they load. Some cache plugins will do this minification on the fly which hurts CPU whereas you really want it done once. There is nothing stopping you combining, compressing and minifying your files by hand. Then you will benefit from small files, fewer HTTP requests and less bandwidth whether or not you make use of a cache plugin.

4. Reduce 404 errors and ensure Wordpress doesn't handle them as it will cane performance unnecessarily. Create a static 404 error page or ensure your cache system is setup to handle 404's. Also make sure that common files that cause 404's such as IPhone icons, Crossdomain.xml and favicons exist even if they are empty files.

5. If you're not planning on using a caching system then you should ensure that you tune your htaccess file manually to ensure that browsers cache your files for specified periods of time rather than downloading them each time they visit your site. You also set your server to serve up compressed gzip files rather than letting a plugin do it for you.

You can do this by setting the future expire headers on your static content such as JS, CSS, images and so on like so:

<FilesMatch "(?i)^.*\.(ico|flv|ogg|swf|jpg|jpeg|png|gif|js|css)$">
ExpiresActive On
ExpiresDefault "access plus 1 weeks"
Header unset Last-Modified
Header set Cache-Control "public, no-transform"
SetOutputFilter DEFLATE
</FilesMatch>


6. Tune your MySQL database by ensuring that your database is set to cache query results and has enough space to do so wisely. Ensure options you don't use or require are disabled and make sure you regularly maintain your tables and indexes by keeping fragmentation to a minimum.

There are a couple of well known tuning scripts which can be used to aid in the setting of your MySQL configuration settings and which use your current database load and settings as a guide to offer recommendations.
http://github.com/rackerhacker/MySQLTuner-perl
http://hackmysql.com/mysqlreport




Uninstall Performance Hogging Plugins

There are lots of plugins available for Wordpress and it can be like a case of a kid let lose in a candy shop as there seems to be at least 10 plugins for everything. However having too many plugins installed is definitely a bad thing in terms of performance and unless you know what the code is doing you could be shooting yourself in the foot by installing the next greatest plugin onto your site without thoroughly checking the source code out for yourself first.

The problem is that literally anyone can write and then publish a plugin on Wordpress and many of these authors are not programmers by trade or have performance in the forefront of their minds as they develop the code that you might use.

Even plugins that are targeted as performance saving tools are not always beneficial and I have seen plugins that are designed to reduce bandwidth by returning 304 Not Modified headers or 403 Forbidden status codes but have to make numerous database queries, DNS lookups and carry out multiple regular expressions to do so. If Bandwidth is a problem then this might be worth the extra load but if it isn't then you are just swapping a small gain in one area for extra work somewhere else.

If you are going to use a plugin then take a look over the source code to see if you can help improve the performance by adding any missing indexes to any new tables the plugin might have added to your Wordpress database. Many plugins do add tables especially if they need to store lots of data and many authors don't include the SQL statements to add appropriate indexes which could end up slowing down lookups down the road as the amount of data within the tables grows.

The following list are extra indexes I have added to tables within the Wordpress database for both Plugins I installed and core Wordpress tables that were missing indexes for certain queries. Remember Wordpress is mainly a READ based system so the extra expense of adding indexes when data is inserted is usually worth it.


Plugin Table IndexName Columns IndexType
- wp_posts status_password_id post_status, post_password, ID Normal
- wp_posts post_date post_Date, ID Unique
fuzzySEOBooster wp_seoqueries_terms term_value_stid term_value, stid unique
fuzzySEOBoosterwp_seoqueries_data stid_pageid_pagetype_founded stid,page_id, page_type,founded unique
WP-O-Matic wp_wpo_campaign_post campaignid_feedid_hash `campaign_id, feed_id, hash Normal
Yet Another
Relatd Post
wp_yarpp_related_cache reference_id reference_ID, ID Normal

Ensure that you reguarly check the MySQL slow query log especially if you have just installed a new plugin as this will help you find queries that need optimising and potential bottlenecks caused by poorly thought out SQL. On my own site I started off using a well known Related Posts plugin but I found out from the Slow log that the queries it ran to create the lists were killing performance due to their design. They were taking 9-12 seconds to run and were scanning up to 25 million records at a time as well as carrying out unnecessary UNION statements which doubled the records it needed to look at. I ended up replacing it with a different plugin called LinkWithin which not only looked great due to the images it used but was perfect for performance because it was a Javascript widget and all the work was carried out on their own server rather than mine.

This might not be the solution for you as obviously Javascript is disabled by 10% of all visitors and bots won't be able to see the links. If SEO is a concern, and it should be then you need to make sure that SERP crawlers find all your content easily and having a server side created list of related articles is a good idea for this reason alone. Therefore you can always create your own Related Posts section very easily with a function placed at the bottom of your articles that uses the categories assigned to the post to find other posts with the same category.

The following example shows one way in which this can be done and it makes use of a nice ORDER BY RAND() trick to ensure different articles and categories appear each time the SQL is run. It also uses Wordpresses inbuilt cache to store the results to prevent the query being executed too many times.

<?php
function get_my_related_posts($id, $limit){

// enable access to the WordPress DB object
global $wpdb;

// define SQL
$sql = "SELECT  CONCAT('http://www.mysite.com/',year(p.post_date),'/',RIGHT(concat('0' ,month(p.post_date)),2),'/',post_name,'/') as permalink,
p.post_title as title
FROM (
SELECT p.ID, p.post_name, p.post_title, p.post_date, terms.slug as category
FROM  wp_posts p,  wp_term_relationships tr,  wp_term_taxonomy tt,  wp_terms as terms
WHERE p.ID               != $id                 AND
p.post_type         = 'post'              AND
p.post_status       = 'publish'           AND
p.ID                = tr.object_id        AND
tr.term_taxonomy_id = tt.term_taxonomy_id AND
tt.taxonomy         in ( 'category')      AND
tt.term_id          = terms.term_id
GROUP BY  p.ID, p.post_title, p.post_name, p.post_date
ORDER BY terms.term_id
) as p,
(
SELECT distinct terms.slug
FROM wp_term_relationships tr, wp_term_taxonomy tt, wp_terms as terms
WHERE tr.object_id        = $id     AND
tr.term_taxonomy_id = tt.term_taxonomy_id AND
tt.taxonomy in ( 'category')    AND
tt.term_id          = terms.term_id
ORDER BY RAND() LIMIT 1
) as t
WHERE p.category = t.slug
ORDER BY  RAND()
LIMIT $limit";

// see if we have a cached recordset
$cache_name = "get_my_related_posts_" . $id;

$result = wp_cache_get( $cache_name );
if ( false == $result ) {

// get results and then cache for later use
$result = $wpdb->get_results( $sql );
wp_cache_set( $cache_name, $result );
}

// return result set as object
return $result;
}
?>
<div id="StrictlyRelatedPosts">
<h3>Related posts</h3>
<ul>
<?php
// fetch 5 related posts
$related_posts = get_related_posts($post->ID, 5);
// open loop
foreach ($related_posts as $related_post) {
$permalink = $related_post->permalink;
$title     = $related_post->title;
print "<li><a title=\"$title\" href=\"$permalink\">$title</a></li>\n";
} ?>
</ul>
</div>



Identifying Bottlenecks in Wordpress

One good plugin which I use for identifying potential problematic queries is the Debug Queries plugin which allows administrators to see all the queries that have run on each page. One extra tweak you should add is to put the following line in at the bottom of the get_fbDebugQueries function (around line 98)

$debugQueries .= '<li class="none_list">' . sprintf(__('» Memory Used %s'),$this->ConvertFromBytes($this->GetMemoryUsage(true))) . '</li>' . "\n";


and then add these two functions underneath that function (around line 106) which get the memory usage and format the value nicely.

// format size from bytes
function ConvertFromBytes($size){

$unit=array('B','KB','MB','GB','TB','PB');

return @round($size/pow(1024,($i=floor(log($size,1024)))),2).$unit[$i];
}

// get PHP memory usage
function GetMemoryUsage(){

if(function_exists("memory_get_peak_usage")) {
return memory_get_peak_usage(true);
}elseif(function_exists("memory_get_usage")) {
return  memory_get_usage(true);
}
}


This will help you see just how many database queries a standard Wordpress page makes (88 on my homepage!) and if you haven't done any performance tuning then you may suddenly feel the urge before you suffer similar problems to those I experienced.

Remember a high performing site is one which attracts visitors and one which SERP bots are now paying more attention to when indexing. Therefore you should always aim to get the best performance out your system as is feasibly possible and as I have shown that doesn't mean spending a fortune on hardware.




Turning off WordPress features


If you ever look at your sites log file you might see that there is a lot of occurrences of requests to a page called wp-cron.php.

This is a page that handles internal scheduling by WordPress and many plugins hook into this to schedule tasks which is useful for people who don't have access to their webservers control panel as they can still set up "cron" jobs of a sort. The only difference being that these cron jobs are fired when a page on the site is loaded and if you have a very quiet site a job you may want to run once every 5 minutes won't do if you don't get traffic every minute of the day.


POST /wp-cron.php?doing_wp_cron=1331142791

Sometimes you will even see multiple requests spawned (by your own servers IP) within the same second e.g

123.123.XXX.XX - - [07/Mar/2012:18:03:57 +0000] "POST /wp-cron.php?doing_wp_cron=1331143104 HTTP/1.0" 200 - "-" "WordPress/3.3.1; http://www.mysite.com"
123.123.XXX.XX - - [07/Mar/2012:18:03:57 +0000] "POST /wp-cron.php?doing_wp_cron=1331143109 HTTP/1.0" 200 - "-" "WordPress/3.3.1; http://www.mysite.com"
123.123.XXX.XX - - [07/Mar/2012:18:03:57 +0000] "POST /wp-cron.php?doing_wp_cron=1331143128 HTTP/1.0" 200 - "-" "WordPress/3.3.1; http://www.mysite.com"

To me this seems like overkill. Yes the wp-cron job is needed to run internal Wordpress tasks such as posting scheduled posts or firing jobs that have been setup to use the internal cron system but having multiple requests fire at the same time seems unneccessary at best.

Why is this bad - well as this blog post about it says boltwebhosting.com says:

Wp-cron.php is called every time a page is loaded. That means if you are getting 50 visitors to your site every hour, and each of them reads 2-3 pages, then wp-cron.php is being called:
50 x 2.5 = 125 times per hour
125 x 24 = 3,000 times per day
3,000 x 30 = 90,000 times per month!
It does not just stop there, because unlike other features in WordPress, the wp-cron.php is spawned as an independent process which can sometimes take several minutes to complete its operations. So an active WordPress site with the traffic volume listed above is spawing 3,000 processes every day which do not really do anything.

Therefore on a very busy site you will be firing this page a lot of times and this may cause severe performance issues on it's own.

The solution is to replace this CRON job with a proper CRON job if possible. To do this you need access to your servers control panel or console.

If you do have access to do so then setup a real CRON task on your server that fires the page once an hour or a time more appropriate to your needs. If you don't use the internal cron job for anything then the longer the gap the better but be careful as plugins may use it without your knowledge such as Database Backup plugins or Sitemap generator plugins.

This is the format to use:

wget -U StrictlyCron -q -O /dev/null http://www.mysite.com/wp-cron.php?doing_wp_cron


You will notice that I am setting the -U parameter (user-agent) to StrictlyCron. This is because I block all blank useragent requests to my site with .htaccess rules (see the security article) and it also helps me identify my own requests in the log file.

Once you have done this you need to edit your sites wp-config.php file which will be in the root of your sites setup and add this line of code to the top of it.


/* disable WP-CRON from running all the time on every page load! */
define('DISABLE_WP_CRON', true);


As the comments state, this is disabling WordPress from firing it's own internal CRON job and as we have replaced it with a real CRON job that will run once an hour rather than on every page load it should reduce our traffic and server load considerably.



Turning Off WordPress HeartBeat

The WordPress HeartBeat functionality was introduced in WP 3.6 to allow interaction between the server and browser using AJAX. However like AutoSave and WP_CRON it can cause a lot of unnecessary HTTP requests as it defaults to 15 seconds a request.

The WordPress Heartbeat API allows WordPress to communicate between the web-browser and the server. It also improves session management, revision tracking, and auto saving. The WordPress Heartbeat API uses /wp-admin/admin-ajax.php, which allows WordPress to keep track of what's going on in the dashboard.

Unfortunately, this can also cause excessive requests to admin-ajax.php, leading to high CPU / Bandwidth usage. Whenever a web-browser is left open on a page using the Heartbeat API, this could potentially be an issue.

I have accidentally left open a post I was editing in a Chrome browser (that always re-opens pages that you had open when you close it) for a week and my bandwidth costs jumped by a good $30.

I scanned my log files and saw /wp-admin/admin-ajax.php being called every 15 seconds for the post page (seen in the Referer section of the log file).

Therefore I shut down the page ASAP and I added the following code to my functions.php file in my theme to only run the code on the post page as it's needed to delete custom fields, show tags and other features that make editing / adding posts easy.

To turn off the HeartBeat functionality go to your themes functions.php file and put the following code at the top of it.

If you don't want to turn it off but just change the timings from 15 seconds to a minute or something else you can but it relies on you editing a core compressed JavaScript WordPress file. You can read about how to do this here.

// stop heartbeat code
add_action( 'init', 'stop_heartbeat', 1 );

function stop_heartbeat() {
        global $pagenow;

        if ( $pagenow != 'post.php' && $pagenow != 'post-new.php' )
  {
   wp_deregister_script('heartbeat');
  }
}



Wordpress Performance Summary

  • Ensure Apache is configured correctly and don't leave the default values as they are. Make sure MaxClients is set correctly by dividing your RAM by the average Apache process size leaving room for MySQL and anything else you might be running.

  • Tune your MySQL database by configuring correctly and maintaining regularly. Use one of the many free tuning scripts to help set your configuration up correctly but ensure you read up about the various settings and what they do first.


  • Install a Caching plugin that creates hard copies of commonly requested files. Static HTML is fast to load. PHP is costly to compile. Use a PHP accelerator and ensure database query results are cached.


  • Reduce bandwidth by combining, compressing and minifying your CSS, JS and HTML. If your caching plugin doesn't do it once rather than on the fly do it by hand. Remember the key is to do expensive operations once and then re-use the results as many times as possible.


  • Set your htaccess file up correctly. Ban bad bots to reduce traffic, set far future expiry headers on your static files and use static files to handle 404, 403, 503 errors etc.


  • Reduce the number of plugins and ensure any that you use are not hurting performance. Make sure any tables they use are covered by indexes and use the slow query log to identify problems.



  • Disable Wordpress's internal CRON job and replace it with a real CRON job that runs once every hour or 30 minutes rather than on every page load.

  • Disable WordPress HeartBeat functionality or only allow it on post edits to prevent repeated HTTP calls if a page is left open in a browser. You can change the timings from 15 seconds to whatever you want but this means editing a compressed WordPress core JS file. 





Read Part 1 - An Overview
Read Part 3 - Security



Further Reading:






Tuesday, 11 November 2014

Turn off WordPress HeartBeat to reduce bandwidth and CPU

Turn off WordPress HeartBeat to reduce bandwidth and CPU

By Strictly-Software

I recently noticed a spike in bandwidth and costs on my Rackspace server. The cost had jumped up a good $30 from normal months.

Now I am still in the process of finding out why this has happened but one thing I did come across was a lot of calls to a script called /wp-admin/admin-ajax.php which was happening every 15 seconds.

Now this is the sign of WordPress's HeartBeat functionality which allows the server and browser to communicate and I quote from the inmotionhosting.com website, HeartBeat
allows WordPress to communicate between the web-browser and the server. It allows for improved user session management, revision tracking, and auto saving.
The WordPress Heartbeat API uses /wp-admin/admin-ajax.php to run AJAX calls from the web-browser. Which in theory sounds awesome, as WordPress can keep track of what's going on in the dashboard.
However this can also start sending excessive requests to admin-ajax.php which can lead to high CPU usage. Anytime a web-browser is left open on a page using the Heartbeat API, this could potentially be an issue.
Therefore I scanned my log files and found that my own server IP was making calls to a page every 15 seconds e.g

62.21.14.247 - - [11/Nov/2014:15:00:20 +0000] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 98 "http://www.mysite.com/wp-admin/post.php?post=28968&action=edit&message=1" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36" 0/799585
62.21.14.247 - - [11/Nov/2014:15:00:35 +0000] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 98 "http://www.mysite.com/wp-admin/post.php?post=28968&action=edit&message=1" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36" 25/25540888

I checked my browser (Chrome) which I do leave lots of windows open for ages, multi tasking :) , and I found that I had left open a post edit window in WordPress. This was causing the HeartBeat to call the script every 15 seconds.

Now I don't know that this is the ONLY reason for my increase in Bandwidth and obviously CPU due to all the HTTP requests but I am guessing it made up a big part of it.

Therefore I decided to turn off the HeartBeat functionality.

I have already disabled auto saving and revisions as I don't need that functionality so I am going to see what happens - hopefully my costs will go down!

Turning Off WordPress HeartBeat

To turn off the HeartBeat functionality go to your themes functions.php file and put the following code at the top of it.


// stop heartbeat code
add_action( 'init', 'stop_heartbeat', 1 );

function stop_heartbeat() {
        wp_deregister_script('heartbeat');

}


So I will see what happens with this turned off. So far not a lot.

However if you do notice a spike in your Bandwidth or CPU and you use WordPress check you haven't left a page open in your browser that would be causing the HeartBeat function to call the /wp-admin/admin-ajax.php every 15 seconds!