Tuesday 9 September 2008

SQL Performance Top Tips

Another SQL Performance Top Tips?

I know there are lots of Top Tips for improving SQL performance out there on the web but you can never have too much of a good thing so I have created my own list of issues to identify and resolve when trying to improve the performance of an SQL database.

1. Investigate and resolve dead locks or blocked processes.

All it takes is one blocked process on a commonly hit table for your whole site to hang and start reporting timeout issues.

On my jobboard sites I had locking issues with my main JOBS table that had to be joined to a number of other tables to get the data needed on results pages that were hit constantly by crawlers. These tables were also being updated constantly by clients and automated feeds from multi-posters which meant that locks were likely.

As I use one database for multiple sites it also meant that a lock caused by one site would cause timeout issues for the other sites.

To resolve this I created some de-normalised flat tables per site that contain all the fields needed for reporting and searching which meant:

  • The JOB table is only updated at regular intervals and not instantly. SQL 2005 introduced the synonym feature which I demonstrate here. This is used to swap between tables being built and tables in use by the site so that a new table can be built behind the scenes with up to date data.
  • Clustered indexes can be site specific depending on the columns the site search on the most.
  • No joins needed when accessing the table and results are returned very fast.
  • Any problems accessing this table do not affect other sites.
Another trick to handle instances where you are selecting from tables that maybe updated constantly such as job or banner hit tables is to look at the following:
  1. If the table you are selecting from is never updated or deleted from during the times you are selecting from then you can use the WITH (NOLOCK) statement on your SQL SELECTS. You won't have to worry about dirty reads as you are not actually updating the data and you bypass all the overhead SQL has to go through to maintain the LOCKING.
  2. Use the LOCK_TIMEOUT statement to reduce the time LOCKS are held for.
  3. Use the TRY / CATCH statement to catch deadlocks and other blocking errors in combination with a retry loop. After X retries return from the proc. View my example of using LOCK_TIMEOUT to handle DEADLOCKS and BLOCKING.

2. Look into large batch processes that have long run times, take up I/O, have long wait times or cause blocking.

A good example is where you have to transfer lots of records from one table to another e.g from a daily table that only receives INSERTS (e.g job / banner hits) into a historical table for reporting.

If you are deleting large numbers of records from a table that is also being used then you don't want to cause blocks that will freeze your site. In conjunction with point 1 you should look into how you handle your DELETE / UPDATE statements so that they are done in small BATCHES using the SQL 2005+ TOP command.

Read this article for an example of updating or deleting in batches to prevent blocking.

To find out problematic queries after the fact you can utilise the SQL 2005+ Data Management Views (DMV's) which hold key information about the system.

Read this article on my top queries for performance tuning to help identify those queries that need tuning.

3. Re-Index and De-Frag tables

Make sure you regularly re-index and de-fragment your tables as over time fragmentation will build up and performance will be affected. I tend to set up a weekly MS Agent job that runs a Defrag / Re-organize or  on tables with fragmentation over a set percentage as well as rebuild index statistics.

I then also try to schedule a full re-index of all my main tables once a quarter at scheduled down times as during a full rebuild the tables can be taken off line depending on your SQL setup.

4. Identify Slow Queries

Investigate your slowest running queries to make sure they are covered adequately by indexes but also try not to over use indexes. Try to cover as many queries with as few indexes as possible.

Try running this SQL performance report which will identify a dozen or so areas which could be improved on your system including:

  • Causes of the server waits
  • Databases using the most IO
  • Count of missing indexes, by database
  • Most important missing indexes
  • Unused Indexes
  • Most costly indexes (high maintenance)
  • Most used indexes
  • Most fragmented indexes
  • Most costly queries, by average IO
  • Most costly queries, by average CPU
  • Most costly CLR queries, by average CLR time
  • Most executed queries
  • Queries suffering most from blocking
  • Queries with the lowest plan reuse

This is an invaluable tool for any SQL DBA or SQL performance tuning developer.

5. Split tables

If your tables are used for inserting/updating as well as selection then for each index you have on a table that's an extra update required when a record is saved. On some of my big tables that are used heavily for reporting I split the data into daily and historical data. The daily table will allow updates but the historical table will not.

At night a job will transfer the daily data into the historical table, dropping all current indexes, populating the new data and then rebuilding the indexes with a 100% fill factor. You need to balance out whether speed on data retrieval or on update is more important if the table is used for both. You should also look into points 1 and 2 about managing blocking and batch updates/deletes to handle instances where a table is being deleted whilst accessed at the same time.

6. Re-write complex queries and make WHERE clauses SARGABLE

Can you rewrite complex queries to be more efficient. Can you remove left joins to large tables
by populating temporary tables first. Are you putting functions in the WHERE, HAVING clause on the column and negating any index usage. If so can you rewrite the clause and make it SARGABLE so that you make use of the index. For example a WHERE clause like so

WHERE DateDiff(day,CreateDate,GetDate()) = 0

Which is filtering by todays date should be rewritten to:

WHERE CreateDate > '2008-sep-09 00:00:00' --where this is the previous midnight

Put the value from GetDate() into a variable before the SELECT and then use that in the filter
so that no function is used and any index on CreateDate will be available.

Read this article of mine which proves the benefits of SARGABLE clauses in SQL.

7. Query Plan Caching

Check that you are benefiting from query plan caching. If you are using stored procedures that contain branching through use of IF statements that run different SQL depending on the values passed in by parameters then the cached query plan for this procedure may not be the best for all parameter variations.

You can either rewrite the procedure so that each IF branch calls another stored procedure that contains the query required as it will be this plan that gets used.

Or you could rewrite the query to use dynamic SQL so that you build up a string containing the appropriate syntax and parameters and then execute it using sp_executesql which will take advantage of plan re-use. Do not use EXEC as this will not take advantage of plan re-use and its not as safe in terms of sql injection.

You should also look into whether your query plans are not getting used due to your SQL Server settings as the default mode is to use SIMPLE PARAMETERIZATION and not FORCED PARAMETERIZATION.

This features takes AdHoc queries containing literal values and removes values replacing them with parameters. This means that the query plan which gets cached can be re-used for similar queries that have different values which can aid performance as it will reduce compilation time.

When the system is set to SIMPLE mode only AdHoc query plans that contain the same literal values get cached and re-used which in many cases is not good enough as the values for most queries will change all the time for example in SIMPLE mode only the plan for the exact query below will be cached.

WHERE ProductID = 10

Which means only products with the ProductID of 10 will benefit from a cached plan however with FORCED PARAMETERIZATION enabled you would have plan with parameters so that any ProductID can benefit from it e.g

WHERE ProductID = @1 -- name of SQL parameter

For a more detailed look at the benefits of this method read my article on forced paramaterization.

8. Use the appropriate table for the situation

SQL has a variety of tables from fixed permanent tables to global and local temporary tables to table variables that are both stored in tempdb.

I have found a number of times now that the use of table variables start off being used in stored procedures as very useful memory efficient storage mechanisms but once the datasets stored within them rises above some threshold (I have not found the exact threshold amount yet) the performance drops incredibly.

Whilst useful for array like behaviour within procs and quick for small datasets they should not be used with record sets of many thousands or millions of records. The reason being that no indexes can be added to them and any joins or lookups will result in table scans which are fine with 10-100 records but with 20 million will cause huge problems.

Swapping table variables for either fixed tables or temporary tables with indexes should be considered when the size of data is too great for a table variable.

If your procedures are being called by website logins with execute permission you will need to impersonate a login with higher privileges to allow for the DDL statements CREATE and DROP as a basic website login should have nothing more than read privileges to prevent SQL injection attacks.

If you don't want to risk the rise in privileges then consider using a fixed table and the use of a stamp and unique key so that multiple calls don't clash with each other. Appropriate indexes should be used to ensure speedy data retrieval from these tables.

A real world example I have found is explained in this article on speeding up batch processes that use table variables.

9. Tuning for MySQL

MS SQL has a vast array of useful tools to help you performance tune it from it's Data Management Views and Activity Monitor to it's detailed Query Execution Plans. However if you are using MySQL and have to suffer using a tool like NAVICAT then you are not in such a good position to tune your queries.

The EXPLAIN option is nowhere near as useful as the MS SQL Query Execution Plan but it can be used to ensure the appropriate indexes are added if missing and the Slow Query Log is a useful place to identify problematic queries.

If you are using LINUX or Wordpress and running your system on MySQL then read this article of mine on performance tuning for MySQL.

These are just some tips to look at when trying to improve back end database performance and I will add to this list when I get some more time. Feel free to post your own tips for identifying problematic SQL and then fixing it.

No comments: