Sunday 5 February 2012

Low Server Usage, Long Page Loads with APACHE Caching

Disabling Apache Caching improves LAMP Server Performance

On one of my virtual servers which runs LAMP on only a 1GB RAM box with a couple of quite high traffic sites I have been experiencing an intermittent problem which had the following symptoms.

  1. Long page load wait time - Spinning browser window with nothing happening
  2. Very low server load average < 10 despite large number of requests to website
  3. Large amount of disk swapping - where the hard drive is used to temporarily store data

This also coincided with these errors in the log files

[Thu Feb 02 16:30:57 2012] [error] (103)Software caused connection abort: cache: error returned while trying to return mem cached data

Running a TOP command would return something like this

op - 16:42:06 up 3 days, 21:29,  2 users,  load average: 0.12, 0.39, 0.67, 1.39
Tasks: 219 total,   1 running, 218 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.1%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1060972k total,   931768k used,   129204k free,     9668k buffers
Swap:  2097144k total,   328740k used,  1768404k free,   362284k cached

An APACHE restart always fixed this issue temporarily however on running

/etc/init.d/apache2 restart

I noticed it was carrying out an htcacheclean command and it was always on this clean commands completion that the page load would suddenly spin into action before terminating as the web-server stopped.

Both websites that ran on the server were using Wordpress systems that utilised caching plugins, namely WP-SuperCache and WP-Widget-Cache as well as .htaccess file rules that set future expiry headers on certain file types e.g JavaScript, CSS, Images etc.

I also ban over 55% of all traffic through rules that block known bad BOTS, spammers, hackers and known bad IP addresses. If you are not writing articles in Chinese or Russian and have no wish to pay for the bandwidth caused by BOTS like Yandex and Baiduspider to visit constantly then 403 them! They both pay no attention to the Robots.txt file so in my book can be labelled as "Bad BOTS" anyway.

As the main symptom was very low server loads AND hanging pages that were cured by running htcacheclean it made sense to try something recommended by someone I was in contact with at Tiger Technologies who recommended disabling the APACHE caching.

He recommended this because I was already utilising other caching technologies at the Wordpress plugin level and the errors in the log file suggested some sort of problem with APACHE's caching system. Apparently this is not a default option to be enabled and although I have no recollection of enabling it as I didn't set the server up it was obviously turned on.

After checking all the config files to ensure I wasn't making specific use of these Apache modules I disabled the following Apache Modules: disk_cache file_cache mem_cache with the following command in my console.


These all had to be disabled first before I could disable the main cache module that they all depended on.

After disabling all these caching modules I ran the following command to check that the syntax of all my Apache config files were correct and still in working order.

apache2ctl configtest

This returns "Syntax OK" if everything is okay.

A restart of Apache now didn't run the htcacheclean command and afterwards the problem seemed to dissipate.

One thing I have noticed coming from a Windows environment is just the pure amount of places that you can enable caching from the web-server level to MySQL, Wordpress, Plugins that make use of static files, PHP Accelerators such as APC and memcache for those with RAM to spare.

There are an awful lot of configuration possibilities with a LAMP setup and it seems that APACHE's own caching system was causing the problem probably due to the amount of memory it required for a busy site which was the cause of the large amount of disk swapping that was going on.

Maybe if I had a large amount of RAM to spare this wouldn't have been a problem but as I have shown in previous articles related to Twitter Rushes caused by large concurrent BOT visits after Tweeting links if you don't have lots of RAM and haven't lowered your MaxClients setting to an appropriate level from the default of 256 to 12-25, a 1GB RAM server can run out of memory very quickly indeed.

However whilst the Twitter Rush / MaxClients problem is easily detected by high server loads, a look at the visitor log file after Tweeting plus some maths to divide your RAM by the average amount of memory used by a page load plus any other applications that constantly run such as MySQL. This configuration issue was a bit harder to work out due to the fact I was experiencing very low server loads and thought I had done the right thing by utilising various caching techniques.

The Apache caching was obviously causing my server an issue and as always being able to methodologically diagnose a problem by stepping through the various possibilities ruling them out one by one is a technique that always works. 

1 comment:

Ranger Big Brother said...

I noticed a similar problem. Since we're caching much of our site with CloudFront I turned off apache caching and it looks like that may have fixed the problem... In our case, though htcacheclean was running it didn't appear to be cleaning cache... and eventually we'd get an error in the log, right before apache stopped serving: "(103)Software caused connection abort: cache: error returned while trying to return disk cached data"

Apache restart brought the site right back up, temporarily at least.