Monday, 14 March 2011

SQL Varchar Comparison Ignores Right Hand White Space

MS SQL Server Text Comparisons

I was doing some work in SQL 2008 today that involved some data cleaning from a scraper that was extracting specific data from a webpage and trying to match it in a DB. The data was very dirty and not consistent in the slightest so I was running it through a number of custom parser functions to try and find matches.

Whilst doing this I came across something that I certainly thought was odd and hadn't seen before. I don't know if this is a well know "quirk" in MS SQL or not but a couple of colleagues had never seen it either.

It involved string comparisons where one string had white space to the right hand side of it and the other had no space either side and a string comparison matched.

This only occurs when the white space is to the right hand side of the text and not to the left hand side as the following test shows.

I had never come across this quirk before so I thought I would make a note of it incase others hadn't either. I ran this test on 2008 and 2005 with the same results.

DECLARE @Test1 varchar(100),
@Test2 varchar(100),
@Test3 varchar(100)

SELECT @Test1 = 'Stratford ',
@Test2 = ' Stratford',
@Test3 = ' Stratford '

IF @Test1 = 'Stratford'
PRINT 'Is Stratford'
ELSE
PRINT 'Is not Stratford'

IF @Test2 = 'Stratford'
PRINT 'Is Stratford'
ELSE
PRINT 'Is not Stratford'

IF @Test3 = 'Stratford'
PRINT 'Is Stratford'
ELSE
PRINT 'Is not Stratford'

SELECT '"' + @Test1 + '"', '"' + @Test2 + '"', '"' + @Test3 + '"'




The output I get is the following


Is Stratford
Is not Stratford
Is not Stratford

and then a recordset.

"Stratford    " "   Stratford" "   Stratford   "
Maybe this is a well known behaviour but I wasn't expecting it to behave like this so it threw me off a bit during my cleanup process.


1 comment:

Pseudothink said...

Also watch out if regular white spaces (ASCII code 32) are mixed in with non-breaking spaces (ASCII code 160). SQL will recognize trailing non-breaking spaces, unlike white spaces.