Tuesday, 30 July 2013

Handling Blocking in SQL 2005 with LOCK_TIMEOUT and TRY CATCH statements

Handing Blocking and Deadlocks in SQL Stored Procedures

We recently had an issue at night during the period in which daily banner hit/view data is transferred from the daily table to the historical table.

During this time our large website was being hammered by BOTs and users and we were getting lots of timeout errors reported due the the tables we wanted to insert our hit records into being DELETED and UPDATED causing locks.

The default lock time is -1 (unlimited) but we had set it to our default command timeout of 30 seconds.

However if the DELETE or UPDATE in the data transfer job took over 30 seconds then the competing INSERT (to insert a banner hit or view) would time out and error with a database timeout due to the Blocking process not allowing our INSERT to do its job.

We tried a number of things including:

  • Ensuring all tables were covered by indexes to speed up any record retrieval
  • Reducing the DELETE into small batches of 1000 or 100 at a time in a WHILE loop to reduce the length of time the LOCK was held each time.
  • Ensuring any unimportant SELECT statements from these tables were using WITH (NOLOCK) to get round any locking issues.

However none of these actually helped solve the problem so in the end we rewrote our stored procedure (SQL 2005 - 2008) so that it handled the LOCK TIMEOUT error and didn't return an error.

In SQL 2005 you can make use of TRY CATCH statements which meant that we could try a certain number of times to insert our data and if it failed we could just return quickly without an error as we also used a TRANSACTION to enable us to ROLLBACK or COMMIT the transaction.

We also set the LOCK_TIMEOUT to 500 milliseconds (so x 3 = 1.5 seconds) as if the insert couldn't be done in that time frame then there was no point logging it. We could have inserted it into another table to be added to our statistics later on but that is another point.

The code is below and shows you how to trap BLOCKING errors including DEADLOCKS and handle them.

Obviously this doesn't fix anything it just "masks" the problem from the end user and reduces the number of errors due to database timeouts due to long waiting blocked processes.

CREATE PROCEDURE [dbo].[usp_net_update_banner_hit]

@BannerIds varchar(200), -- CSV of banner IDs e.g 100,101,102

@HitType char(1) = 'V', -- V = banner viewed, H = banner hit

AS

SET NOCOUNT ON

SET LOCK_TIMEOUT 500 -- set to half a second


DECLARE @Tries tinyint

-- start at 1
SELECT @Tries = 1


-- loop for 3 attempts

WHILE @Tries <= 3

  BEGIN

 BEGIN TRANSACTION

 BEGIN TRY

  -- insert our banner hits we are only going to wait half a second

  INSERT INTO tbl_BANNER_DATA
  (BannerFK, HitType, Stamp)
  SELECT  [Value], @HitType, getdate()
  FROM dbo.udf_SPLIT(@BannerIds,',') -- UDF that splits a CSV into a table variable
  WHERE [Value] > 0

  --if we are here its been successful ie no deadlock or blocking going on
  COMMIT    

  -- therefore we can leave our loop
  BREAK

 END TRY

 -- otherwise we have caught an error!
 BEGIN CATCH

  --always rollback   
  ROLLBACK

  -- Now check for Blocking errors 1222 or Deadlocks 1205 and if its a deadlock wait for a while to see if that helps

  IF ERROR_NUMBER() = 1205 OR ERROR_NUMBER() = 1222

    BEGIN

   -- if its a deadlock wait 2 seconds then try again
   IF ERROR_NUMBER() = 1205
     BEGIN

    -- wait 2 seconds to see if that helps the deadlock
    WAITFOR DELAY '00:00:02'

     END   

       -- no need to wait for anything for BLOCKING ERRORS as our LOCK_TIMEOUT is going to wait for half a second anyway
       -- and if it hasn't finished by then (500ms x 3 attempts = 1.5 seconds) there is no point waiting any longer

    END      

  -- increment and try again for 3 goes
  SELECT @Tries = @Tries + 1

  -- we carry on until we reach our limit i.e 3 attempts
  CONTINUE    

   END CATCH

  END

Tuesday, 16 July 2013

MySQL Server won't restart

MySQL Server won't restart

Today I went to restart MySQL from my SSH console with the following command:

/etc/init.d/mysql restart

However even though the database server stopped it wouldn't restart.

I tried opening another console and running the status command.

/etc/init.d/mysql status

But this just told me it was stopped and a start command kept failing.

Even when I went into my VirtualMin website that manages my virtual server the service wouldn't restart.

I dug into the services and databases and tried accessing a database from VMIN and saw a message saying the system couldn't retrieve a list of databases. Further digging gave me this error:

can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock'

Now I knew I had changed some settings in the my.cnf configuration file the other day for performance tuning. So I searched the web and found that I had added a command that wasn't supported by my older version i.e MySQL 5.0.51.

The command in question was:

skip-external-locking

Because Java had crashed and I couldn't access the file to edit it easily.

I quickly ran a:

CHMOD 777 /etc/mysql/my.cnf 

command to allow me to edit the file from FTP and then I swapped the new command with the older one which my version of MySQL supported:

skip-locking

I copied the file back and then hey presto a start command got the server back and running:

/etc/init.d/mysql status

I then made sure to CHMOD the file back so it couldn't be written by the website.

However on checking my website I only got to see the theme but NO articles.

It was a Wordpress site and the problem was probably WP-Super-Cache caching a page without data. I needed to run a REPAIR command on my wp_posts table to ensure all posts were visible again.

This is something I have seen many times before with hard reboots. The system comes back up but no articles appear. I always REPAIR and OPTIMIZE my wp_posts and wp_posts_meta table to rectify this.

This obviously locked the database up, as well as consuming 99% CPU whilst it ran - something which really annoys me, but afterwards the site was working.

So if you have been performance tuning your own MySQL database make sure you are not adding commands that your server doesn't support. A failed restart is a sure sign of unsupported configuration commands.

Wednesday, 10 July 2013

Apache Performance Tuning BASH Script

BASH Script to tune Apache Configuration Settings

As you might know a lot of the time I think the LAMP / Wordpress combo is a big bag of shite.

There are so many configuration options, at so many different levels that need tuning to get optimal performance, it is a nightmare to find the right information. There is also too many people offering various solutions for Wordpress / Linux / Apache / MySQL configuration.

Different people recommend different sizes for your config values and just trying to link up server load with page/URL/script requests to find out the cause of any performance issue is a nightmare in itself.

I would have thought there would have been a basic tool out there that could log server load, memory, disk swapping over time and then link that up with the MySQL slow query log, Apache error AND access logs so that you could easily tell when you had issues what processes were running, which URL's were being hit and how much activity was going on to identify culprits for tuning. I have even thought of learning PERL just to write one - not that I want to!

Even with all the MySQL tuning possible, caching plugins installed and memory limits on potentially intensive tasks it can be a nightmare to get the best out of a 1GB RAM, 40GB Virtual Server that is constantly hammered by BOTS, Crawlers and humans. I ban over 50% of my traffic and I still get performance issues at various times of the day - why? I have no FXXING idea!

Without throwing RAM at the problem you can try and set your APACHE values in the config file to appropriate values for your server and MPM fork type.

For older versions of Apache the Multi-Processing Module, non-threaded, pre-forking webserver is well suited as long as the configuration is correct. However it can consume lots of memory if not configured correctly.

For newer versions (2+) the Worker MPM is better as each thread handles a connection at a time and this is considered better for high traffic servers due to the smaller memory footprint. However to get PHP working on this setting apparently needs a lot of configuration and you should read up about this before considering a change.

Read about Apache performance tuning here Apache Performance Tuning.

To find out your current apache version from the console run

apache2 -v OR httpd -v (depending on your server type, if you run top and see apache2 threads then use apache2 otherwise use httpd)

You will get something like this.

Server version: Apache/2.2.9 (Debian) Server built: Feb 5 2012 21:40:20

To find out your current module configuration from the console run

apache2 -V OR httdp -V

Server version: Apache/2.2.9 (Debian)
Server built: Feb 5 2012 21:40:20
Server's Module Magic Number: 20051115:15
Server loaded: APR 1.2.12, APR-Util 1.2.12
Compiled using: APR 1.2.12, APR-Util 1.2.12
Architecture: 64-bit Server
MPM: Prefork threaded: no forked: yes (variable process count)
etc etc etc...

There are lots of people giving "suitable" configuration settings for the various apache settings but one thing you need to do if you run TOP and notice high memory usage and especially high virtual memory usage is try and reduce disk swapping.

I have noticed that when Apache is consuming a lot of memory that your virtual memory (disk based) will be high and you will often experience either high server loads and long wait times for pages to load OR very small server loads e.g 0.01-0.05, an unresponsive website and lots of MySQL Server Gone Away messages in your error log file.

You need to optimise your settings so that disk swapping is minimal which means trying to optimise your MySQL settings using the various MySQL tuning tools I have wrote about as well as working out the right size for your Apache configuration values.

One problem is that if you use up your memory by allowing MySQL to have enough room to cache everything it needs then you can find yourself with little left for Apache. Depending on how much memory each process consumes you can easily find that a sudden spike in concurrent hits uses up all available memory and starts disk swapping.

Therefore apart from MySQL using the disk to carry out OR caching large queries you need to find the right number of clients to allow at any one time. If you allow too many and don't have enough memory to contain them all then the server load will go up, people will wait and the amount of disk swapping will increase and increase until you enter a spiral of doom that only a restart fixes.

It is far better to allow fewer connections and serve them up quickly with a small queue and less waiting than open too many for your server to handle and create a massive queue with no hope of ending.

One of the things you should watch out for is Twitter Rushes caused by automatically tweeting your posts to twitter accounts as this can cause 30-50 BOTS to hit your site at once. If they all consume your memory up then it can cause a problem that I have wrote about before.

Working out your MaxClients value

To work out the correct number of clients to allow you need to do some maths and to help you I have created a little bash script to do this.

What it does is find out the average size of an Apache thread then restarts Apache so that the correct "free size" value can be obtained.

It then divides the remainder by the Apache process size. The value you get should be roughly the right value for your MaxClients.

It will also show you how much disk swapped or virtual memory you are using as well as the size of your MySQL process.

I noticed on my own server that when it was under-performing I was using twice as much disk space as RAM. However when I re-configured my options and gave the system enough RAM to accommodate all the SQL / APACHE processes then it worked fine with low swapping.

Therefore if your virtual memory is greater than the size of your total RAM e.g if you are using 1.5GB of hard disk space as virtual memory and only have 1GB of RAM then it will show an error message.

Also as a number of Apache tuners claim that your MinSpareServers should be 10-25% of your MaxClients value and your MaxSpareServers value 25-50% of your MaxClientsValue I have also included the calculations for these settings as well.


#!/bin/bash
echo "Calculate MaxClients by dividing biggest Apache thread by free memory"
if [ -e /etc/debian_version ]; then
 APACHE="apache2"
elif [ -e /etc/redhat-release ]; then
 APACHE="httpd"
fi
APACHEMEM=$(ps -aylC $APACHE |grep "$APACHE" |awk '{print $8'} |sort -n |tail -n 1)
APACHEMEM=$(expr $APACHEMEM / 1024)
SQLMEM=$(ps -aylC mysqld |grep "mysqld" |awk '{print $8'} |sort -n |tail -n 1)
SQLMEM=$(expr $SQLMEM / 1024)
echo "Stopping $APACHE to calculate the amount of free memory"
/etc/init.d/$APACHE stop &> /dev/null
TOTALFREEMEM=$(free -m |head -n 2 |tail -n 1 |awk '{free=($4); print free}')
TOTALMEM=$(free -m |head -n 2 |tail -n 1 |awk '{total=($2); print total}')
SWAP=$(free -m |head -n 4 |tail -n 1 |awk '{swap=($3); print swap}')
MAXCLIENTS=$(expr $TOTALFREEMEM / $APACHEMEM)
MINSPARESERVERS=$(expr $MAXCLIENTS / 4)
MAXSPARESERVERS=$(expr $MAXCLIENTS / 2)
echo "Starting $APACHE again"
/etc/init.d/$APACHE start &> /dev/null
echo "Total memory $TOTALMEM"
echo "Free memory $TOTALFREEMEM"
echo "Amount of virtual memory being used $SWAP"
echo "Largest Apache Thread size $APACHEMEM"
echo "Amount of memory taking up by MySQL $SQLMEM"
if [[ SWAP > TOTALMEM ]]; then
      ERR="Virtual memory is too high"
else
      ERR="Virtual memory is ok"
fi
echo "$ERR"
echo "Total Free Memory $TOTALFREEMEM"
echo "MaxClients should be around $MAXCLIENTS"
echo "MinSpareServers should be around $MINSPARESERVERS"
echo "MaxSpareServers should be around $MAXSPARESERVERS"


If you get 0 for either of the last two values then consider increasing your memory or working out what is causing your memory issues. Either that or set your MinSpareServers to 2 and MaxSpareServers to 4.

There are many other settings which you can find appropriate values for but adding indexes to your database tables and ensuring your database table/query caches can fit in memory rather than swapped to disk is a good way to improve performance without having to resort to more caching at all the various levels Wordpress/Apache/Linux users love doing.

If you do use a caching plugin for Wordpress then I would recommend tuning it so that it doesn't cause you problems.

At first I thought WP SuperCache was a solution and pre-caching all my files would speed things up due to static HTML being served quicker than PHP.

However I found that the pre-cache stalled often, caused lots of background queries to rebuild the files which consumed memory and also took up lots of disk space.

If you are going to pre-cache everything then hold the files as long as possible as if they don't change there seems little point in deleting and rebuilding them every hour or so and using up SQL/IO etc.

I have also turned off gzip compression in the plugin and enabled it at Apache level. It seems pointless doing it twice and PHP will use more resources than the server.

The only settings I have enabled in WP-Super-Cache at the moment are:


  • Don’t cache pages with GET parameters. (?x=y at the end of a url) 
  • Cache rebuild.
  • Serve a supercache file to anonymous users while a new file is being generated. 
  • Extra homepage checks. (Very occasionally stops homepage caching)
  • Only refresh current page when comments made. 
  • Cache Timeout is set to 100000 seconds (why rebuild constantly?)
  • Pre-Load - disabled.

Also in the Rejected User Agents box I have left it blank as I see no reason NOT to let BOTS like googlebot create cached pages for other people to use. As bots will most likely be your biggest visitor it seems odd to not let these BOTS create cached files.

So far this has given me some extra performance.

Hopefully the tuning I have done tonight will help the issue I am getting of very low server loads, MySQL gone away errors and high disk swapping. I will have to wait and see!