Sunday, 25 April 2021

Solving an issue where you cannot tell the difference between the two conflicting words

Weird Issue - Two Names That Look The Same But Aren't

By Strictly-software 

I have a table of Jockeys that returns me all jockeys from a RACE_RUNNERS table but I had a problem which was I kept getting one jockey returned twice in the output. 

I couldn't work out why this would be, and only for one jockey, and jumped straight to collations. The column collation is set to DB default and when I do a compare using the same collation on each value they still don't match. e.g

-- Column in dbo.RACE_RUNNERS table is 

SELECT CONVERT (varchar, SERVERPROPERTY('collation')) AS 'Server Collation'; 

DB Collation = SQL_Latin1_General_CP1_CI_AS

SELECT	Jockey COLLATE SQL_Latin1_General_CP1_CI_AS,'JOCKEY',COUNT(RaceFK)
FROM	dbo.RACE_RUNNERS as run with (nolock)
JOIN	dbo.RACES as r with (nolock)
    ON	r.RacePK = run.RaceFK
WHERE	Jockey IN('Mr P W Mullins','Mr P W Mullins')
	AND Racedatetime > '2021-JAN-01'		
GROUP BY Jockey 
ORDER BY Jockey COLLATE SQL_Latin1_General_CP1_CI_AS

/*
Returns two records for the jockey that look alike
Mr W P Mullins
Mr W P Mullins
*/

-- do some basic comparison tests with each value

IF LTRIM(RTRIM('Mr P W Mullins')) = LTRIM(RTRIM('Mr P W Mullins'))
  PRINT 'MATCH'
ELSE
  PRINT 'NO MATCH'

--NO MATCH

IF LTRIM(RTRIM('Mr P W Mullins')) COLLATE SQL_Latin1_General_CP1_CI_AS = LTRIM(RTRIM('Mr P W Mullins')) COLLATE SQL_Latin1_General_CP1_CI_AS
  PRINT 'MATCH'
ELSE
  PRINT 'NO MATCH' 

--NO MATCH


So I was slightly confused and thinking about deleting the 2nd record for MR P W MULLINS from the table or changing his name first to the MR P W NULLINS with most results. 

Then it dawned on me. Importing data often leads to characters that you cannot see the difference in but in SQL will appear different. So I did a simple test by removing all spaces in the IF statement comparison e.g
IF 'MrPWMullins' = 'MrPWMullins'
  PRINT 'MATCH'
ELSE
  PRINT 'NO MATCH'

-- And the result was
MATCH
So I just checked each space character in the two words with ASCII() function and found that the last space between W and Mullins was different from each other.
-- match the space between W and Mullins as it seems to be different
SELECT ASCII(' ') -- CHAR 160 another type of SPACE
SELECT ASCII(' ') -- CHAR 32 ASCII SPACE

160
32

Outputting both character numbers with CHAR in a select statement just gave me two columns of spaces. No way to tell the difference by sight, and it only occurred on one jockey for some reason. 

Therefore I did an update on the main RACE_RUNNERS table to replace any CHAR(160) with CHAR(32) for this jockey and might have to put it in the main stored proc that either saves jockeys into the RACE_RUNNER table or just the stat proc.

UPDATE	RACE_RUNNERS 
SET	JOCKEY = REPLACE(Jockey,CHAR(160),CHAR(32))
WHERE	JOCKEY = 'Mr P W Mullins'
Hey presto one instance of the name in the table. A tricky issue that I thought was down to collation but it wasn't. 

The collations of table columns and the DB were the same and doing matches with a specified collation didn't solve the issue. 

Therefore you have to think out of the box and go back to basics and just think what could be different. In this case it was the spaces. Removing them from both words created a MATCH result therefore I knew one space was a different character to the other. 

Here is a link to an ASCII chart if you should ever want to see what character CHAR(N) returns > https://ascii.cl/.

Problem solved. 

By Strictly-Software

© 2021 By Strictly-Software