Thursday, 27 October 2011

Proving the benefits of SARGABLE Clauses and converting DATEDIFF to BETWEEN in SQL

Increasing performance with date range searches in SQL

I often find myself wanting to retrieve data from a table for the current day only.

There are many ways to do this but one of the easiest and the one I often find people using is to use
a non SARGABLE WHERE clause that makes use of the DATEDIFF function to only return records for the current date e.g


SELECT *
FROM BLAH
WHERE DATEDIFF(DAY,Stamp,GETDATE())=0

The problem with this is related to performance and whilst you might see negligible effects on small datasets
when you are dealing with datasets containing millions of rows you want the best performance possible.

If you don't know what a SARABLE clause is here is the definition:

In relational databases, a condition (or predicate) in a query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query (using index seeks, not covering indexes). The term is derived from a contraction of Search ARGument Able.

Therefore because we have wrapped the Stamp (Date) column in the DATEDIFF function the clause becomes NON SARGABLE and an INDEX SCAN will be carried out instead of the performance benefits
that an INDEX SEEK would provide.

Obviously if you don't have an index on the data column in question it won't make the slightest bit of difference but lets test this theory.

First we want some code that will give us the start and end date of the current day.

I am using code for SQL 2005 and below here as in SQL 2008 there are specific DATE datatypes
that don't contain time parts but running a simple:

SELECT GETDATE()

-- returns
2011-10-27 15:01:39.000

As we want to replace the DATEDIFF with a BETWEEN @STARTDATE AND @ENDDATE we need to create the minimum date for the day and the maxiumum date for the day.

We can do this using some basic CAST conversions that first create the date for today at midnight e.g 2011-10-27 00:00:00:000 and then once we have that we can add the appropriate number of milliseconds to the date to give us the maximum end date e.g 2011-10-27 23:59:59.997.

To calculate the number of milliseconds we need to add is simple we multiple 60 seconds * 60 minutes * 24 hours * 1000 milliseconds and then deduct 3 (as 997 is the biggest value allowed in TSQL) e.g:


SELECT (60 * 60 * 24 * 1000)-3

The code to get the start and end date for the current day in TSQL is below.


DECLARE @StartDate DATETIME,
 @EndDate DATETIME
  
-- quickly convert our current datetime into the current date at midnight e.g 2011-10-27 00:00:00.000
-- by casting the date as a varchar(12) cutting it off at the space to skip the time part then casting it back to a datetime
SELECT @StartDate = CAST(CAST(GETDATE() as varchar(11)) as datetime)

-- add 86399997 milliseconds to get a nice 23:59:59:997 datetime
SELECT @EndDate = DATEADD(MS,86399997,@StartDate)    

-- run to test
SELECT  @StartDate, @EndDate 

And you will get 2011-10-27 00:00:00.000 and 2011-10-27 23:59:59.997.

Now to prove that doing a search this way is better for performance we pick a large table, ensure there is an index on the date column we are using and write a test harness.

Remember to turn on the "Include Actual Execution Plan" option on, and make sure that we clean out all the cache buffers before running our test.

Using some simple datediff calculations to time each SELECT we can run the following code.


SET DATEFORMAT YMD
SET NOCOUNT ON

DECLARE @START DATETIME, 
 @END DATETIME,   
 @DUR INT
  
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE

SELECT @START = GETDATE()

SELECT Racedatetime
FROM RACES
WHERE DATEDIFF(DAY,Racedatetime,GETDATE())=0 -- only get data for current day

SELECT @END = GETDATE(),
 @DUR = DATEDIFF(MS,@START,@END)

SELECT  'Query 1 with non SARGABLE DATEDIFF function took ' + CAST(@DUR as varchar) + ' milliseconds to run'


SELECT @START = GETDATE()

-- hardcoding values for an example but put our code from above to get the start and end of the current day here
SELECT Racedatetime
FROM RACES
WHERE Racedatetime BETWEEN '2011-10-27 00:00:00:000' AND '2011-10-27 23:59:59:997'

SELECT @END = GETDATE(),
 @DUR = DATEDIFF(MS,@START,@END)

SELECT  'Query 2 with SARGABLE WHERE Clause took ' + CAST(@DUR as varchar) + ' milliseconds to run'

Viewing the execution plan for the query you can see the difference in execution plans between the two queries.

The Non SARGABLE Query that used the DATEDIFF made use of an INDEX SCAN to obtain the result as it had to loop through the dataset running the DATEDIFF function on each Racedatetime column to see if the result was 0 (today).

The SARGABLE Query didn't have to do this and made use of the correct INDEX and no function needed to be applied to each column.

You can see the benefit in the query cost between the two SELECT statements in the following screenshot of the execution plan.

Excution Plan Cost of Query

Statement one that used a NON SARGABLE DATEDIFF clause took 673 milliseconds to run (on a table with 567,031 records) and had a cost of 96% and the statement that used a SARGABLE BETWEEN clause took 30 milliseconds to run and had a cost of 4%

Results of Query

Hopefully you can see from this example why it pays to make your queries as optimal as possible.

For more SQL Tips you can read an old Top SQL Performance Tips article I wrote some time back which still has lots of useful tips on performance tuning your queries.


And for analysing bottle necks and problematic queries you should view my SQL performance tuning script which will identify 14+ different areas which could be improved from query plan re-use, fragmented indexes, high CPU, and missing indexes plus lots more.

Monday, 17 October 2011

Finding Depreciated SQL functions in SQL 2008

Depreciated SQL 2008 features

I have just come across this useful piece of SQL that shows you how many times you are using depreciated features in your SQL 2008 Database.

Run the SQL on your SQL 2008 database in a new query window to see how many times you are using the depreciated features.



SELECT *
FROM sys.dm_os_performance_counters
WHERE object_name = 'SQLServer:Deprecated Features'
 AND cntr_value > 0; 


Whilst these depreciated SQL features still might work in this version of SQL they might not in future releases so fixing them soon is a good idea.


Features that have been depreciated include the often used SET ROWCOUNT N.

For example instead of using SET ROWCOUNT 100 to limit an UPDATE or DELETE statement you would use the new UPDATE TOP(100) or DELETE TOP(100) statements instead.


Whilst the SQL statement for finding out the list of depreciated features is useful it doesn't actually show you where each of the depreciated features are being used in your code.

I don't know if this DMV only looks at stored procs or all queries passed through the system but using a find function like the following piece of SQL might be useful for hunting down depreciated code within the SQL database.


DECLARE @FindStr AS VARCHAR(500)

-- lower case our find string as we lower case the comments
SELECT @FindStr = LOWER('USER_ID')                                                                                                                      

SET NOCOUNT ON

SELECT DISTINCT NAME AS [NAME], 
  CASE  WHEN TYPE ='P' THEN 'PROCEDURE'
        WHEN TYPE IN('FN', 'IF','TF') THEN 'FUNCTION'
    END AS OBJECTTYPE
FROM SYSCOMMENTS as comm
JOIN SYSOBJECTS as obj 
   ON  comm.id = obj.id and obj.type IN ('P', 'FN', 'IF', 'TF')
WHERE LOWER(TEXT) LIKE '%' + LTRIM(RTRIM(@FindStr)) + '%'



You can find a list of all the depreciated features in SQL 2008 and ways to fix them on the Technet Microsoft SQL Website.

If anyone knows of any other ways to hunt down the depreciated features within the SQL 2008 database please let me know.

Thursday, 6 October 2011

Twitter Hash Tag Scanner SEO Application

Introducing the first version of my Twitter Hash Tag Scanner Application

The Strictly HashTag Hunter is a Windows form application that allows you to find the most relevant HashTags and Twitter accounts for a variety of specified search terms and keywords.

This application is ideal for people who have just created a Twitter account and want to analyse their own site specific keywords to find #HashTags they should be following or using themselves.

For instance you might have a blog or site that uses an AutoBlogging tool like my Strictly TweetBot Wordpress Plugin and you might want to set up various AutoTweets with relevant HashTags that are related to certain keywords and content snippets.

This tool also helps you find the most important Twitter accounts that you should be following as it analyses those people that are using the keywords or sentences that you enter on Twitter at that point in time to find the most popular HashTags related to those words as well as the accounts that are using them the most.

Obviously the time of day you run your scan will affect the results as different people Tweet at different times of the day but you will see from the results which Twitter accounts have the most followers and therefore worth following for your own account.

The primary aim of this tool is to help you save time trying to work out which #HashTags to use for your own Tweets as well as working out which @accounts to follow for your own Twitter account.

The Strictly Twitter Hash Tag Hunter is built as a windows application that runs on your own desktop and it hooks into Twitters API to obtain the results. It is perfect for SEO and Social Media analysts as well as people with a new Twitter account who don't know which hash tags and accounts they should be following to make an impact on the social scene.



Screen 1 shows how you enter one or more search terms that you want to find information for. These terms can be anything but if you are looking to utilise this tool with my Strictly TweetBot Wordpress Plugin then you should be looking to find the #HashTags and @Accounts to follow on Twitter related to the key terms your website is based on.

For example if you were running a site about Horse Racing and wanted to find out which Twitter @Accounts to follow and which #HashTags to use in your Tweets you would enter a number of search terms like so:

Horse Racing
Kempton Park
fromthestables.com
Free Racing Tips 
Twitter HashTag Hunter Start up screen
Enter each keyword or search term on it's own line.

Once you have entered each term on it's own line you click the "Search" button and the Scanner gets to work analysing your keywords and finding related Twitter information.

For each search term and keyword it will scan the Twitter API for those words looking for the most popular #hashtags that are related to those keywords.

It will also find the Twitter accounts that make the most use of these terms before ordering the accounts by the number of followers each account has and the hash tags by the number of times they are referenced by those accounts.

On completing the Scan

Screen 2 shows the most popular hash tags found for the search terms that were entered.
Twitter HashTag Completion Screen
The most popular hash tags found for the entered search terms and keywords.

Screen 3 shows the most followed Twitter accounts that used the terms you searched for.
Twitter HashTag Completion Screen
The most followed Twitter accounts for the entered search terms and keywords.

Following Accounts or Hash Tags

Once the Twitter Scan has completed and you have looked at the results you can simply click on the Account or Twitter Link column value to open up the desired URL in your default browser.

Screen 4 shows you selecting the desired Account you want to examine on Twitter.

Selecting a Twitter Account
Selecting an account to examine


Screen 5 shows the http://twitter.com page opening in your browser where you can decide whether or not the account or hash tag is worth following.

Folllowing the selected Twitter Account
Viewing the account in Twitter and following them


If you are already logged into Twitter at the time then it's just a simple matter of clicking the "Follow" button in the top right of the screen and your own Twitter account will now be following the account you opened.


About the Twitter Hash Tag Scanner Application

The application is a multi-threaded standalone executable Windows application and it has been built with users and Twitter in mind so that the Twitter API is not overloaded and abused and that you can continue to get all the information you need from their service.

A progress bar keeps you updated with the amount of scans it has carried out as well as the number of accounts and hashtags it has already found matching your search terms.

If for whatever reason Twitter blocks your requests (for example if you were hammering their API with dozens of search terms in one scan) then the application will slow down the amount of the requests it makes and increase the delays between requests. It also has some built in methods for bypassing certain blocking methods as well as the ability to access the information from other sources.

I am hoping to expand this tool over the years and I have had great feedback from both novice users who have found it very useful in deciding who to follow when they first start to use Twitter as well as SEO experts who utilise social media and Twitter all the time for marketing purposes.

As an introductory offer I am offering this application for the same price as a small donation of only £10.00 and you can buy this application from my application order page.

Tuesday, 4 October 2011

Find position of a string within a paragraph using a Regular Expression

How to find a string within another string using a Regular Expression instead of strpos or stripos

I was writing a piece of code in PHP the other day where I had to find a snippet of text within another longer piece of text (e.g an article) that contained a word. I then wanted to take X number of characters from that first point and return a snippet that didn't cut off the last word in the sentence.

At first I was using the PHP functions strpos and stripos but these don't allow you to use Regular Expressions as the search term (needle in the haystack as PHP.net calls the parameters) and therefore it meant that I was returning mismatches due to the search term being contained within other words.

E.G if I was looking for the word wool it would match woollen.

Therefore the answer was to use a custom function that made use of preg_match and a non greedy capture group at the beginning of a pattern that could be passed to the function (without delimiters).

The function is below



/**
 * Function to find the first occurence of a regular expression pattern within a string
 *
 * @param string $regex
 * @param string $str
 * @param bool $ignorecase
 * @return variant
 */
function preg_pos( $regex, $str, $ignorecase ) 
{ 
 // build up the RegEx wrapping it in @ delimiters
 $pattern = "@^(.*?)" . $regex . "@" . ($ignorecase===true ? "i" : "");

 if( preg_match( $pattern, $str, $matches ) ) {
  return strlen( $matches[ 1 ] ); 
 }

 return false; 
} 


As you can see the pattern needs to be passed in without delimiters e.g instead of /\bwool\b/ or @\bwool\b@ just pass in \bwool\b.

I then add a capture group to the beginning that is non greedy so that it finds the first match from the start of the input string ^(.*?) and then if the pattern is found I can do a strlen on the matching group to get the starting position of the pattern.

If you want the pattern to be case-sensitive then you can just pass in TRUE or FALSE as the extra parameter and the ignore flag will be added to the end of the pattern.

An example of this code being used is below. The code is looping through an array of words looking for the first match within a longer string (some HTML) and then taking 250 characters of text from the starting point, ensuring the last word is a whole word match.


// find first occurence of any of the terms I am looking for and then take 250 characters from the first word
// ensuring I get a whole word at the end

$a = explode(" ",$terms);
foreach($a as $w){

 // skip empty or small terms

 if(!empty($w) && strlen($w) > 2){
   
  // get the position of the word ensuring its not within another word - using \b word boundary - notice no RegEx delimiters @regex@ or /regex/
  // also ensure any special characters within the word are delimited to prevent a mismatch
  $pos = preg_pos( "\b" . preg_quote($w) . "\b", $html, true ) ;

  // if pos is false then its empty otherwise 

  if($pos !== false){

   // found the word take 250 chars from the first occurrence

   $text = substr($html, $pos, 250);
   
   // roll back to last space before our last word to ensure we don't get partial words 

   $text = substr($text, 0, strrpos($text," "));
   
   // now we have found a term exit
   break;
  }
 }
}


Also remember to wrap your word in preg_quote so that any special characters that are used by the Regular Expression engine e.g ? . + * [ ] ( ) { } etc are all characters that need to be escaped properly.

I found this function quite useful.