Friday 18 June 2010

The Wordpress Survival Guide

Surviving Wordpress - An Introduction

As well as my technical blog here on blogger I have a number of Wordpress sites which I host myself on a virtual server.

I have now been using Wordpress and PHP for about 4 years and in that time I have learnt a hell of a lot regarding the pros and cons and do's and don'ts that are involved in running your own Wordpress blog.

As a developer who has spent most of his working career working with Microsoft products moving from a Windows environment to Linux has been a great learning curve and as well as all the tips I have gathered regarding Wordpress I thought I would write a series for other developers in a similar situation or for Webmasters who may have the sites but not the technical know how.

Covering the bases

In the following articles I will look in detail at how to get the most out of your system in terms of performance. If you are like me you are not made of money and able to afford lots of dedicated servers to host your sites on therefore you need to make the most of what you have. Performance tuning your Wordpress site is the most important thing you can do and luckily due to the nature of WordPresses plugins a lot of performance tuning can be done with a couple of clicks.

I will also be looking at performance tuning MySQL which is the database that Wordpress runs on. Moving from MS SQL with all its features and DMV's to MySQL was quite a culture shock for me so there are a few tips I have learnt which might be useful.

First things first - Tools of the trade

First off you will need to know how to get things done. My Wordpress sites are running on a Linux box and one of the first things I did was install VirutalMin which is a graphical UI you access in your browser which lets you manage your server. You could do everything from the command line but coming from a windows environment I found it very useful to be able to see a familiar environment.

After installing VirtualMin you should also install WebMin which is another graphical interface that gives you ultimate flexibility over your server without you ever needing to go near a command line prompt.

As well as setting up FTP to access your files through SFTP (secure FTP) I also installed PUTTY which enables me to connect to my server and get used to the command line way of doing things. I would definitely recommend doing this even if you were like me a Windows person as you should never be afraid to try something new and it's always good to have as many technical skills under your belt as possible. I always try to use the command line first but I know I can fall back on VMin if I need to.

Useful Commands

A good list of Linux applications and commands can be found here: Linux Commands but here are some of the key commands I find myself using over and again.

dateShow the current date and time on the server
cdchange drive e.g cd /var (go to the var directory)
cd ../go back up one directory
cd ../../go back up two directories
lslist out the contents of a directory
whoamisee who you are logged in as
su - [username]Assume the permissions of the specified user
sudo [command]Run a command as root but stay as the user you are logged in as
topShow the current running processes and server load
top -d .2Show the current running processes with .2 second refresh
tail -f access_logView the most current entries in the sites access log
grep "" access_log | tailView the most current entries in the sites access log for a certain IP address
netstat -taShow all current connections to the server
grep "27/Feb/2012:" access_log | sed 's/ - -.*//' | sort | uniq -c | sort -nr | lessView the IP's that appear in your access log the most for a certain date ordered by the most frequent first.
/etc/init.d/apache2 restartRestart Apache
apache2ctl configtestTest the Apache configuration for configuration errors
/etc/init.d/mysql restartRestart MySQL
wget [URL]Remotely access, load and save a file to the current directory
chmod 777 [filepath]Grant full read/write/delete permission to everyone to a file or folder
chmod +x [filepath]Grant execute permission to a script
rebootReboot the server

Handling Emergencies

You need to be prepared to handle emergencies and that involves a quick diagnosis and quick action. What sorts of emergencies can you expect to run into? Well the most common form will be very poor server performance that results in your site being unavailable to visitors. This can happen for a number of reasons including:

1. High visitor traffic due to a popular article appearing on a major site or another form of traffic driver.
2. High bot traffic from undesirable crawlers such as content scrapers, spammers, hackbots or even a denial of service attack. I recently experienced a DOS attack which came from an out of control bot that was making 10+ requests to my homepage a second.
3. A poorly written plugin that is eating up resources.
4. A corrupt database table causing database errors or poorly performing SQL causing long wait times.
5. Moderately high visitor traffic mixed with an unoptimised system set-up that exacerbates the problem.

Identifying the cause of your problem

If you are having a major slow down, site freeze or just don't know what is going on then the first thing is to open up a command prompt and run top to see the current processes.

The first thing to look at is the load average as this tells you the amount of resources and pressure your server is currently under. A value of 1.00 means your server is maxed out
and anything over that means you are in trouble. I have had a value of 124 before which wasn't good at all. My site was inaccessible and only a cold reboot could get me back to a controlable state.

If your load average is high then take a look at the types of processes that are consuming the most resources. You should be looking at the CPU% and Memory used by each process (the RES) column which shows the amount of physical memory in KB consumed by the process.

Each request to your site will use its own process so if your report is full of Apache rows then you are having a traffic spike. Each page request on your site should be a very quick affair so the processes will come and go very speedily and having a short delay interval is important to being able to spot problems.

Another process to look for is the MySQL process which again should come and go unless it's currently running a long performance intensive query in which case the database could be your problem. Another tool I like to use is mytop which gives you a top like display but of your MySQL processes only. You could open up your MySQL console and run SHOW PROCESSLIST constantly but using MyTop is alot easier and it will help identify problematic queries as well as queries that are being run by your site a lot.

If you don't have monitoring tools available to keep you up to date with your sites status then you may find that your system experiences severe problems periodically without your knowledge. Being informed by a customer that your site is down is never the best way of finding out therefore you might be interested in a plugin I developed called Strictly System Check.

This Wordpress plugin is a reporting tool that you run at scheduled intervals from a CRON job. The plugin will check that your site is available and reporting a 200 status code as well as scanning for known text. It will also connect to your database, check the database and server load and report on a number of important status variables such as the number of connections, aborted connections, slow queries and much more.

The great thing about this plugin is that if it finds any issues with the database it will CHECK and then REPAIR any corrupt tables as well as running the OPTIMIZE command to keep the tables defragged and fast. If any problems are found an email can be send to let you know. I wrote this plugin because there wasn't anything like it available and I have found it 100% useful in not only keeping me informed of site issues but also in maintaining my system automatically.

Scanning Access logs for heavy hitters

Something else you should take a look straight away is your access and error logs.
If you open up your access log and watch it for a while you should soon see if you are experiencing either high traffic in general or from a particular IP/Useragent such as a malicious bot. Using a command like tail or less to view the logfile with the -f flag ensures that as new data is added to the file it will be outputted to the screen which is what you want when examining current site usage.

mydomain:~# cd /home/mywebsite/logs
mydomain:~# tail -f access_log

Banning Bad Users

If the problem is down to one particular IP or Useragent who is hammering your site then one solution is to ban the robot by returning it a 403 Forbidden status code which you can do with your .htacess file by adding in the following lines:

order allow,deny
deny from
deny from 67.207.201.
deny from
allow from all

This will return 403 forbidden codes to all requests from the two IP addresses and the one IP subnet: 67.207.201.

If you don't want to ban by IP but by user-agent then you can use the Mod Rewrite rules to identify bad agents in the following manner:

RewriteCond %{HTTP_USER_AGENT} (?:ColdFusion|curl|HTTPClient|Java|libwww|LWP|Nutch|PECL|POE|Python|Snoopy|urllib|WinHttp) [NC,OR] # HTTP libraries
RewriteCond %{HTTP_USER_AGENT} (?:ati2qs|cz32ts|indy|linkcheck|Morfeus|NV32ts|Pangolin|Paros|ripper|scanner) [NC,OR] # hackbots or sql injection detector tools being misused!
RewriteCond %{HTTP_USER_AGENT} (?:AcoiRobot|alligator|auto|bandit|capture|collector|copier|disco|devil|downloader|fetch\s|flickbot|hapax|hook|igetter|jetcar|kmbot|leach|mole|miner|mirror|mxbot|race|reaper|sauger|sucker|snake|stripper|vampire|weasel|whacker|xenu|zeus|zip) [NC] # offline downloaders and image grabbers
RewriteRule .* - [F,L]

Here I am banning a multitude of known bad user-agents as well as a number of the popular HTTP libraries that script kiddies and hackers use off the shelf without knowing how to configure to hide the default values.

You should read up on banning bad robots using the htaccess file and Mod Rewrite as a considerable proportion of your traffic will be from non human bots and not the good kind e.g Googlebot or Yahoo. By banning bad bots, content scrapers, spammers, hackers and bandwidth leeches you will not only reduce the load on your server but save yourself money on bandwidth charges.

The other log file you should check ASAP in a potential emergency situation is the Apache error log as this will tell you if the problem is related to a PHP bug, a Wordpress plugin or MySQL error.

Unless you have disabled all your warnings and info messages the error log is likely to be full of non fatal errors however anything else should be checked out. If your error log is full of database errors such as "table X is corrupt" or "database has gone away" then you know where to look for a solution.

Tables get corrupted for many reasons but a common reason I have found is when I have had to carry out a cold reboot to regain control of my server. Sometimes after a reboot everything will be working okay but on accessing your website all the content will have disappeared. Do not worry yet as this could be due to the corrupt tables and carrying out a REPAIR command should remedy this.

Another potential flash point are new or recently upgraded plugins. Plugins can be written by anybody and there is no guarantee whatsoever that the code contained within the plugin is of any quality whatsoever even if the features it offers seem to be great. I have personally found some of the most popular plugins to be performance hogs due to either poor code or complex queries with missing indexes but more of that in a later article.

Unless you are prepared to tweak other peoples code you don't have many options apart from optimising the queries the plugin runs by adding missing indexes or disabling the plugin and finding an alternative. One good tip I have found is to create an empty plugin folder in the same directory as the current plugin folder and then in emergency situations you can rename your existing plugin folder to something like plugins_old and then your site will be running without plugins. Once you have remedied any problems you can add your plugins back one by one to ensure they don't cause any problems.

Regular Maintenance

You should reguarly check your access and error logs even when the site is running smoothly to ensure that problems don't build up without you realising. You should also check your slow query log for poor queries especially after installing new plugins as it's very easy to gain extra performance from adding missing indexes especially when your site has tens of thousands of articles.

You should also carry out regular backups of your database and Wordpress site and ensure that you run the OPTIMIZE command to defrag fragmented table indexes especially if you have deleted data such as posts, tags or comments. A fragmented table is slower to scan and it's very easy to optimize at the click of a button. Take a look at the Strictly System Check Wordpress Plugin which can be setup to report on and analyse your system at scheduled intervals as one of the features is the ability to run the OPTIMIZE command.

So this is the end of the first part of the Wordpress Survival Guide series and next time I will be looking at performance tuning and site optimisation techniques.


Tom Bird said...

Great article. Just what I wanted to give me an overview of getting started with Wordpress and LAMP.


Anonymous said...

Excellent post. I'm going through many of these issues as well..

Feel free to visit my site back pain remedies

Unknown said...

so good blog

Cheap SEO Packages

SEO Services said...

This article is completely awesome for me and often the one activity is done of searching content for collecting wonderful knowledge.