Showing posts with label plan. Show all posts
Showing posts with label plan. Show all posts

Tuesday, 9 June 2009

Optimizing a query with Forced Parameterization

SQL 2005 Forced Parameterization Overview

SQL 2005 has a feature which will take AdHoc queries containing literal values and remove those values replacing them with parameters. This means that the query plan which gets cached can be re-used for similar queries that have different values which can aid performance as it will reduce compilation time. This is called FORCED PARAMETERIZATION however the default mode is SIMPLE which will only cache and re-use AdHoc plans containing the same literal values. This cool feature is useful for systems where stored procs and prepared statements aren't being used as these forms of query will already benefit from cached plan re-use. You should read BOL about the suitability of this feature as certain constructs and features will prevent it from being utilized. It may also result in unsuitable query plans being re-used in certain cases rather than a new plan being generated for a query.


Using Forced Parameterization

I came across this feature on the wonderful SQLServerCentral.com site on a discussion about dynamic SQL. I read up about it and thought it might come in useful on a site that has intermittent problems on a results page when hit by concurrent crawlers which results in high CPU on the SQL Server. The page in question is a results page that allows filtering and ordering by multiple columns, both directions and also uses paging. However the query is an AdHoc query that is built up on the client using literal values that are appended to a string rather than using parameters and something like ADO's sqlCommand object to pass those parameter values to the database.

I looked into rewriting this query to make use of parameters but the WHERE clause was being cached and re-used by multiple queries on the same page so it wasn't a simple change therefore it seemed like a good candidate for FORCED PARAMETERIZATION.


Testing the theory

Before enabling this feature for the live site I wanted to prove on our development box that
a) Simple mode just wasn't good enough and was causing multiple plans to be compiled and cached for this query.
b) Forced mode would enable one plan to be cached and re-used for multiple parameter variations of this query.


Enabling Forced Parameterization

I had some trouble with this due to a number of other sites/blogs showing the incorrect way to set this but the correct way is:
-- enable forced mode
ALTER DATABASE YOUR_DATABASE SET PARAMETERIZATION FORCED

-- enable simple mode
ALTER DATABASE YOUR_DATABASE SET PARAMETERIZATION SIMPLE

You can also use the Management Console and the properties option under the relevant database to switch the setting over manually. When switching from one mode to another I made sure to clear the current cache by running the following:
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
I then used the following query which makes use of SQL 2005's DMV's (data management views) to check the queries that were currently cached and the type of method being engaged.

SELECT *
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
INNER JOIN sys.dm_exec_cached_plans as cp on qs.plan_handle=cp.plan_handle
WHERE cp.plan_handle=qs.plan_handle
AND (ObjType = 'Adhoc' OR ObjType='Prepared')

Notice that I am filtering on AdHoc and Prepared and ignoring Stored Procs, Triggers and Views. I initially tried filtering by the dbid (database id) column to narrow down the results to my particular database however for AdHoc and Prepared queries this column contains NULL values.

The columns I am particular interested in are the plan_handle which is the ID of the query plan related to the SQL statement and execution_count which tells me how many times the SQL related to the plan has been executed.

A very simple version of the query in question which conveys the point I am trying to make is below.

SELECT * FROM JOBS WHERE Row between 1 AND 20

With SIMPLE mode enabled a test page running my query with different criteria values for my BETWEEN statement caused multiple records to be returned from sys.dm_exec_cached_plans each with a different plan_handle value and the value for ObjType set to AdHoc. This tells me that the plans are not being re-used by similar queries when they could be. Running the same query with the same values does increment the execution_count against the relevant plan record which shows that a very simple form of re-use is being utilized. However as I want the best performance as possible this isn't good enough.

With FORCED mode enabled and the cache cleared the test pages with different criteria for my BETWEEN statement results in a single record to be returned with one plan_handle. The value for ObjType has now changed from AdHoc to Prepared and looking at the start of the text column which contains the SQL I can see the following before the start of my SQL:
(@0 int,@1 int,@2 int,@3 int,@4 varchar(8000),@5 int,@6 int)

Which is the database creating the parameters required to replace the literal values from my query. The values from my BETWEEN clause have been changed from literals to parameters e.g:
--Simple Mode Enabled
WHERE Row BETWEEN 1 AND 20
--Forced Mode Enabled
WHERE Row BETWEEN @5 and @6
Re-running the test page with different values for these parameters causes the execution_count to increment which informs me that the query plan is being cached and re-used across similar queries which is what I wanted to prove.


Enabling Forced Parameterization for a single query

There maybe situations where you want to enable this feature for one or more queries rather than set it on for the whole database. You can do this by setting up a plan guide for the query in question. An easy way to do this is to run the following SQL making sure to replace the first parameter value with the SQL you want to parametrize.


DECLARE @stmt nvarchar(max);
DECLARE @params nvarchar(max);
EXEC sp_get_query_template
N'SELECT * FROM JOBS WHERE Row between 1 AND 20;',
@stmt OUTPUT,
@params OUTPUT;
EXEC sp_create_plan_guide
N'MyPlanGuide_JobResults',
@stmt,
N'TEMPLATE',
NULL,
@params,
N'OPTION(PARAMETERIZATION FORCED)';


You can then set you database to SIMPLE mode but still benefit from having parametrized queries that match the template you have just created in the previous plan. Some useful SQL related to managing query plans is below:


--View all plans in your system
SELECT * FROM sys.plan_guides

--Disable the plan guide.
EXEC sp_control_plan_guide N'DISABLE', N'MyPlanGuide_JobResults';
GO

--Enable the plan guide.
EXEC sp_control_plan_guide N'ENABLE', N'MyPlanGuide_JobResults';
GO

--Drop the plan guide.
EXEC sp_control_plan_guide N'DROP', N'MyPlanGuide_JobResults';

Tuesday, 9 September 2008

SQL Performance Top Tips

Another SQL Performance Top Tips?

I know there are lots of Top Tips for improving SQL performance out there on the web but you can never have too much of a good thing so I have created my own list of issues to identify and resolve when trying to improve the performance of an SQL database.


1. Investigate and resolve dead locks or blocked processes.


All it takes is one blocked process on a commonly hit table for your whole site to hang and start reporting timeout issues.

On my jobboard sites I had locking issues with my main JOBS table that had to be joined to a number of other tables to get the data needed on results pages that were hit constantly by crawlers. These tables were also being updated constantly by clients and automated feeds from multi-posters which meant that locks were likely.

As I use one database for multiple sites it also meant that a lock caused by one site would cause timeout issues for the other sites.

To resolve this I created some de-normalised flat tables per site that contain all the fields needed for reporting and searching which meant:

  • The JOB table is only updated at regular intervals and not instantly. SQL 2005 introduced the synonym feature which I demonstrate here. This is used to swap between tables being built and tables in use by the site so that a new table can be built behind the scenes with up to date data.
  • Clustered indexes can be site specific depending on the columns the site search on the most.
  • No joins needed when accessing the table and results are returned very fast.
  • Any problems accessing this table do not affect other sites.
Another trick to handle instances where you are selecting from tables that maybe updated constantly such as job or banner hit tables is to look at the following:
  1. If the table you are selecting from is never updated or deleted from during the times you are selecting from then you can use the WITH (NOLOCK) statement on your SQL SELECTS. You won't have to worry about dirty reads as you are not actually updating the data and you bypass all the overhead SQL has to go through to maintain the LOCKING.
  2. Use the LOCK_TIMEOUT statement to reduce the time LOCKS are held for.
  3. Use the TRY / CATCH statement to catch deadlocks and other blocking errors in combination with a retry loop. After X retries return from the proc. View my example of using LOCK_TIMEOUT to handle DEADLOCKS and BLOCKING.



2. Look into large batch processes that have long run times, take up I/O, have long wait times or cause blocking.

A good example is where you have to transfer lots of records from one table to another e.g from a daily table that only receives INSERTS (e.g job / banner hits) into a historical table for reporting.

If you are deleting large numbers of records from a table that is also being used then you don't want to cause blocks that will freeze your site. In conjunction with point 1 you should look into how you handle your DELETE / UPDATE statements so that they are done in small BATCHES using the SQL 2005+ TOP command.

Read this article for an example of updating or deleting in batches to prevent blocking.

To find out problematic queries after the fact you can utilise the SQL 2005+ Data Management Views (DMV's) which hold key information about the system.

Read this article on my top queries for performance tuning to help identify those queries that need tuning.


3. Re-Index and De-Frag tables

Make sure you regularly re-index and de-fragment your tables as over time fragmentation will build up and performance will be affected. I tend to set up a weekly MS Agent job that runs a Defrag / Re-organize or  on tables with fragmentation over a set percentage as well as rebuild index statistics.

I then also try to schedule a full re-index of all my main tables once a quarter at scheduled down times as during a full rebuild the tables can be taken off line depending on your SQL setup.


4. Identify Slow Queries

Investigate your slowest running queries to make sure they are covered adequately by indexes but also try not to over use indexes. Try to cover as many queries with as few indexes as possible.

Try running this SQL performance report which will identify a dozen or so areas which could be improved on your system including:


  • Causes of the server waits
  • Databases using the most IO
  • Count of missing indexes, by database
  • Most important missing indexes
  • Unused Indexes
  • Most costly indexes (high maintenance)
  • Most used indexes
  • Most fragmented indexes
  • Most costly queries, by average IO
  • Most costly queries, by average CPU
  • Most costly CLR queries, by average CLR time
  • Most executed queries
  • Queries suffering most from blocking
  • Queries with the lowest plan reuse

This is an invaluable tool for any SQL DBA or SQL performance tuning developer.



5. Split tables

If your tables are used for inserting/updating as well as selection then for each index you have on a table that's an extra update required when a record is saved. On some of my big tables that are used heavily for reporting I split the data into daily and historical data. The daily table will allow updates but the historical table will not.

At night a job will transfer the daily data into the historical table, dropping all current indexes, populating the new data and then rebuilding the indexes with a 100% fill factor. You need to balance out whether speed on data retrieval or on update is more important if the table is used for both. You should also look into points 1 and 2 about managing blocking and batch updates/deletes to handle instances where a table is being deleted whilst accessed at the same time.


6. Re-write complex queries and make WHERE clauses SARGABLE

Can you rewrite complex queries to be more efficient. Can you remove left joins to large tables
by populating temporary tables first. Are you putting functions in the WHERE, HAVING clause on the column and negating any index usage. If so can you rewrite the clause and make it SARGABLE so that you make use of the index. For example a WHERE clause like so

WHERE DateDiff(day,CreateDate,GetDate()) = 0

Which is filtering by todays date should be rewritten to:


WHERE CreateDate > '2008-sep-09 00:00:00' --where this is the previous midnight

Put the value from GetDate() into a variable before the SELECT and then use that in the filter
so that no function is used and any index on CreateDate will be available.

Read this article of mine which proves the benefits of SARGABLE clauses in SQL.


7. Query Plan Caching

Check that you are benefiting from query plan caching. If you are using stored procedures that contain branching through use of IF statements that run different SQL depending on the values passed in by parameters then the cached query plan for this procedure may not be the best for all parameter variations.

You can either rewrite the procedure so that each IF branch calls another stored procedure that contains the query required as it will be this plan that gets used.

Or you could rewrite the query to use dynamic SQL so that you build up a string containing the appropriate syntax and parameters and then execute it using sp_executesql which will take advantage of plan re-use. Do not use EXEC as this will not take advantage of plan re-use and its not as safe in terms of sql injection.

You should also look into whether your query plans are not getting used due to your SQL Server settings as the default mode is to use SIMPLE PARAMETERIZATION and not FORCED PARAMETERIZATION.

This features takes AdHoc queries containing literal values and removes values replacing them with parameters. This means that the query plan which gets cached can be re-used for similar queries that have different values which can aid performance as it will reduce compilation time.

When the system is set to SIMPLE mode only AdHoc query plans that contain the same literal values get cached and re-used which in many cases is not good enough as the values for most queries will change all the time for example in SIMPLE mode only the plan for the exact query below will be cached.


SELECT *
FROM PRODUCTS
WHERE ProductID = 10

Which means only products with the ProductID of 10 will benefit from a cached plan however with FORCED PARAMETERIZATION enabled you would have plan with parameters so that any ProductID can benefit from it e.g


SELECT *
FROM PRODUCTS
WHERE ProductID = @1 -- name of SQL parameter


For a more detailed look at the benefits of this method read my article on forced paramaterization.


8. Use the appropriate table for the situation

SQL has a variety of tables from fixed permanent tables to global and local temporary tables to table variables that are both stored in tempdb.

I have found a number of times now that the use of table variables start off being used in stored procedures as very useful memory efficient storage mechanisms but once the datasets stored within them rises above some threshold (I have not found the exact threshold amount yet) the performance drops incredibly.

Whilst useful for array like behaviour within procs and quick for small datasets they should not be used with record sets of many thousands or millions of records. The reason being that no indexes can be added to them and any joins or lookups will result in table scans which are fine with 10-100 records but with 20 million will cause huge problems.

Swapping table variables for either fixed tables or temporary tables with indexes should be considered when the size of data is too great for a table variable.

If your procedures are being called by website logins with execute permission you will need to impersonate a login with higher privileges to allow for the DDL statements CREATE and DROP as a basic website login should have nothing more than read privileges to prevent SQL injection attacks.

If you don't want to risk the rise in privileges then consider using a fixed table and the use of a stamp and unique key so that multiple calls don't clash with each other. Appropriate indexes should be used to ensure speedy data retrieval from these tables.

A real world example I have found is explained in this article on speeding up batch processes that use table variables.


9. Tuning for MySQL

MS SQL has a vast array of useful tools to help you performance tune it from it's Data Management Views and Activity Monitor to it's detailed Query Execution Plans. However if you are using MySQL and have to suffer using a tool like NAVICAT then you are not in such a good position to tune your queries.

The EXPLAIN option is nowhere near as useful as the MS SQL Query Execution Plan but it can be used to ensure the appropriate indexes are added if missing and the Slow Query Log is a useful place to identify problematic queries.

If you are using LINUX or Wordpress and running your system on MySQL then read this article of mine on performance tuning for MySQL.


These are just some tips to look at when trying to improve back end database performance and I will add to this list when I get some more time. Feel free to post your own tips for identifying problematic SQL and then fixing it.