Showing posts with label Cache. Show all posts
Showing posts with label Cache. Show all posts

Thursday, 27 October 2011

Proving the benefits of SARGABLE Clauses and converting DATEDIFF to BETWEEN in SQL

Increasing performance with date range searches in SQL

I often find myself wanting to retrieve data from a table for the current day only.

There are many ways to do this but one of the easiest and the one I often find people using is to use
a non SARGABLE WHERE clause that makes use of the DATEDIFF function to only return records for the current date e.g


SELECT *
FROM BLAH
WHERE DATEDIFF(DAY,Stamp,GETDATE())=0

The problem with this is related to performance and whilst you might see negligible effects on small datasets
when you are dealing with datasets containing millions of rows you want the best performance possible.

If you don't know what a SARABLE clause is here is the definition:

In relational databases, a condition (or predicate) in a query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query (using index seeks, not covering indexes). The term is derived from a contraction of Search ARGument Able.

Therefore because we have wrapped the Stamp (Date) column in the DATEDIFF function the clause becomes NON SARGABLE and an INDEX SCAN will be carried out instead of the performance benefits
that an INDEX SEEK would provide.

Obviously if you don't have an index on the data column in question it won't make the slightest bit of difference but lets test this theory.

First we want some code that will give us the start and end date of the current day.

I am using code for SQL 2005 and below here as in SQL 2008 there are specific DATE datatypes
that don't contain time parts but running a simple:

SELECT GETDATE()

-- returns
2011-10-27 15:01:39.000

As we want to replace the DATEDIFF with a BETWEEN @STARTDATE AND @ENDDATE we need to create the minimum date for the day and the maxiumum date for the day.

We can do this using some basic CAST conversions that first create the date for today at midnight e.g 2011-10-27 00:00:00:000 and then once we have that we can add the appropriate number of milliseconds to the date to give us the maximum end date e.g 2011-10-27 23:59:59.997.

To calculate the number of milliseconds we need to add is simple we multiple 60 seconds * 60 minutes * 24 hours * 1000 milliseconds and then deduct 3 (as 997 is the biggest value allowed in TSQL) e.g:


SELECT (60 * 60 * 24 * 1000)-3

The code to get the start and end date for the current day in TSQL is below.


DECLARE @StartDate DATETIME,
 @EndDate DATETIME
  
-- quickly convert our current datetime into the current date at midnight e.g 2011-10-27 00:00:00.000
-- by casting the date as a varchar(12) cutting it off at the space to skip the time part then casting it back to a datetime
SELECT @StartDate = CAST(CAST(GETDATE() as varchar(11)) as datetime)

-- add 86399997 milliseconds to get a nice 23:59:59:997 datetime
SELECT @EndDate = DATEADD(MS,86399997,@StartDate)    

-- run to test
SELECT  @StartDate, @EndDate 

And you will get 2011-10-27 00:00:00.000 and 2011-10-27 23:59:59.997.

Now to prove that doing a search this way is better for performance we pick a large table, ensure there is an index on the date column we are using and write a test harness.

Remember to turn on the "Include Actual Execution Plan" option on, and make sure that we clean out all the cache buffers before running our test.

Using some simple datediff calculations to time each SELECT we can run the following code.


SET DATEFORMAT YMD
SET NOCOUNT ON

DECLARE @START DATETIME, 
 @END DATETIME,   
 @DUR INT
  
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE

SELECT @START = GETDATE()

SELECT Racedatetime
FROM RACES
WHERE DATEDIFF(DAY,Racedatetime,GETDATE())=0 -- only get data for current day

SELECT @END = GETDATE(),
 @DUR = DATEDIFF(MS,@START,@END)

SELECT  'Query 1 with non SARGABLE DATEDIFF function took ' + CAST(@DUR as varchar) + ' milliseconds to run'


SELECT @START = GETDATE()

-- hardcoding values for an example but put our code from above to get the start and end of the current day here
SELECT Racedatetime
FROM RACES
WHERE Racedatetime BETWEEN '2011-10-27 00:00:00:000' AND '2011-10-27 23:59:59:997'

SELECT @END = GETDATE(),
 @DUR = DATEDIFF(MS,@START,@END)

SELECT  'Query 2 with SARGABLE WHERE Clause took ' + CAST(@DUR as varchar) + ' milliseconds to run'

Viewing the execution plan for the query you can see the difference in execution plans between the two queries.

The Non SARGABLE Query that used the DATEDIFF made use of an INDEX SCAN to obtain the result as it had to loop through the dataset running the DATEDIFF function on each Racedatetime column to see if the result was 0 (today).

The SARGABLE Query didn't have to do this and made use of the correct INDEX and no function needed to be applied to each column.

You can see the benefit in the query cost between the two SELECT statements in the following screenshot of the execution plan.

Excution Plan Cost of Query

Statement one that used a NON SARGABLE DATEDIFF clause took 673 milliseconds to run (on a table with 567,031 records) and had a cost of 96% and the statement that used a SARGABLE BETWEEN clause took 30 milliseconds to run and had a cost of 4%

Results of Query

Hopefully you can see from this example why it pays to make your queries as optimal as possible.

For more SQL Tips you can read an old Top SQL Performance Tips article I wrote some time back which still has lots of useful tips on performance tuning your queries.


And for analysing bottle necks and problematic queries you should view my SQL performance tuning script which will identify 14+ different areas which could be improved from query plan re-use, fragmented indexes, high CPU, and missing indexes plus lots more.

Tuesday, 9 June 2009

Optimizing a query with Forced Parameterization

SQL 2005 Forced Parameterization Overview

SQL 2005 has a feature which will take AdHoc queries containing literal values and remove those values replacing them with parameters. This means that the query plan which gets cached can be re-used for similar queries that have different values which can aid performance as it will reduce compilation time. This is called FORCED PARAMETERIZATION however the default mode is SIMPLE which will only cache and re-use AdHoc plans containing the same literal values. This cool feature is useful for systems where stored procs and prepared statements aren't being used as these forms of query will already benefit from cached plan re-use. You should read BOL about the suitability of this feature as certain constructs and features will prevent it from being utilized. It may also result in unsuitable query plans being re-used in certain cases rather than a new plan being generated for a query.


Using Forced Parameterization

I came across this feature on the wonderful SQLServerCentral.com site on a discussion about dynamic SQL. I read up about it and thought it might come in useful on a site that has intermittent problems on a results page when hit by concurrent crawlers which results in high CPU on the SQL Server. The page in question is a results page that allows filtering and ordering by multiple columns, both directions and also uses paging. However the query is an AdHoc query that is built up on the client using literal values that are appended to a string rather than using parameters and something like ADO's sqlCommand object to pass those parameter values to the database.

I looked into rewriting this query to make use of parameters but the WHERE clause was being cached and re-used by multiple queries on the same page so it wasn't a simple change therefore it seemed like a good candidate for FORCED PARAMETERIZATION.


Testing the theory

Before enabling this feature for the live site I wanted to prove on our development box that
a) Simple mode just wasn't good enough and was causing multiple plans to be compiled and cached for this query.
b) Forced mode would enable one plan to be cached and re-used for multiple parameter variations of this query.


Enabling Forced Parameterization

I had some trouble with this due to a number of other sites/blogs showing the incorrect way to set this but the correct way is:
-- enable forced mode
ALTER DATABASE YOUR_DATABASE SET PARAMETERIZATION FORCED

-- enable simple mode
ALTER DATABASE YOUR_DATABASE SET PARAMETERIZATION SIMPLE

You can also use the Management Console and the properties option under the relevant database to switch the setting over manually. When switching from one mode to another I made sure to clear the current cache by running the following:
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
I then used the following query which makes use of SQL 2005's DMV's (data management views) to check the queries that were currently cached and the type of method being engaged.

SELECT *
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
INNER JOIN sys.dm_exec_cached_plans as cp on qs.plan_handle=cp.plan_handle
WHERE cp.plan_handle=qs.plan_handle
AND (ObjType = 'Adhoc' OR ObjType='Prepared')

Notice that I am filtering on AdHoc and Prepared and ignoring Stored Procs, Triggers and Views. I initially tried filtering by the dbid (database id) column to narrow down the results to my particular database however for AdHoc and Prepared queries this column contains NULL values.

The columns I am particular interested in are the plan_handle which is the ID of the query plan related to the SQL statement and execution_count which tells me how many times the SQL related to the plan has been executed.

A very simple version of the query in question which conveys the point I am trying to make is below.

SELECT * FROM JOBS WHERE Row between 1 AND 20

With SIMPLE mode enabled a test page running my query with different criteria values for my BETWEEN statement caused multiple records to be returned from sys.dm_exec_cached_plans each with a different plan_handle value and the value for ObjType set to AdHoc. This tells me that the plans are not being re-used by similar queries when they could be. Running the same query with the same values does increment the execution_count against the relevant plan record which shows that a very simple form of re-use is being utilized. However as I want the best performance as possible this isn't good enough.

With FORCED mode enabled and the cache cleared the test pages with different criteria for my BETWEEN statement results in a single record to be returned with one plan_handle. The value for ObjType has now changed from AdHoc to Prepared and looking at the start of the text column which contains the SQL I can see the following before the start of my SQL:
(@0 int,@1 int,@2 int,@3 int,@4 varchar(8000),@5 int,@6 int)

Which is the database creating the parameters required to replace the literal values from my query. The values from my BETWEEN clause have been changed from literals to parameters e.g:
--Simple Mode Enabled
WHERE Row BETWEEN 1 AND 20
--Forced Mode Enabled
WHERE Row BETWEEN @5 and @6
Re-running the test page with different values for these parameters causes the execution_count to increment which informs me that the query plan is being cached and re-used across similar queries which is what I wanted to prove.


Enabling Forced Parameterization for a single query

There maybe situations where you want to enable this feature for one or more queries rather than set it on for the whole database. You can do this by setting up a plan guide for the query in question. An easy way to do this is to run the following SQL making sure to replace the first parameter value with the SQL you want to parametrize.


DECLARE @stmt nvarchar(max);
DECLARE @params nvarchar(max);
EXEC sp_get_query_template
N'SELECT * FROM JOBS WHERE Row between 1 AND 20;',
@stmt OUTPUT,
@params OUTPUT;
EXEC sp_create_plan_guide
N'MyPlanGuide_JobResults',
@stmt,
N'TEMPLATE',
NULL,
@params,
N'OPTION(PARAMETERIZATION FORCED)';


You can then set you database to SIMPLE mode but still benefit from having parametrized queries that match the template you have just created in the previous plan. Some useful SQL related to managing query plans is below:


--View all plans in your system
SELECT * FROM sys.plan_guides

--Disable the plan guide.
EXEC sp_control_plan_guide N'DISABLE', N'MyPlanGuide_JobResults';
GO

--Enable the plan guide.
EXEC sp_control_plan_guide N'ENABLE', N'MyPlanGuide_JobResults';
GO

--Drop the plan guide.
EXEC sp_control_plan_guide N'DROP', N'MyPlanGuide_JobResults';