Weird Issue - Two Names That Look The Same But Aren't
By Strictly-softwareI have a table of Jockeys that returns me all jockeys from a RACE_RUNNERS table but I had a problem which was I kept getting one jockey returned twice in the output.
I couldn't work out why this would be, and only for one jockey, and jumped straight to collations.
The column collation is set to DB default and when I do a compare using the same collation on each value they still don't match.
e.g
So I was slightly confused and thinking about deleting the 2nd record for MR P W MULLINS from the table or changing his name first to the MR P W NULLINS with most results.
-- Column in dbo.RACE_RUNNERS table is
SELECT CONVERT (varchar, SERVERPROPERTY('collation')) AS 'Server Collation';
DB Collation = SQL_Latin1_General_CP1_CI_AS
SELECT Jockey COLLATE SQL_Latin1_General_CP1_CI_AS,'JOCKEY',COUNT(RaceFK)
FROM dbo.RACE_RUNNERS as run with (nolock)
JOIN dbo.RACES as r with (nolock)
ON r.RacePK = run.RaceFK
WHERE Jockey IN('Mr P W Mullins','Mr P W Mullins')
AND Racedatetime > '2021-JAN-01'
GROUP BY Jockey
ORDER BY Jockey COLLATE SQL_Latin1_General_CP1_CI_AS
/*
Returns two records for the jockey that look alike
Mr W P Mullins
Mr W P Mullins
*/
-- do some basic comparison tests with each value
IF LTRIM(RTRIM('Mr P W Mullins')) = LTRIM(RTRIM('Mr P W Mullins'))
PRINT 'MATCH'
ELSE
PRINT 'NO MATCH'
--NO MATCH
IF LTRIM(RTRIM('Mr P W Mullins')) COLLATE SQL_Latin1_General_CP1_CI_AS = LTRIM(RTRIM('Mr P W Mullins')) COLLATE SQL_Latin1_General_CP1_CI_AS
PRINT 'MATCH'
ELSE
PRINT 'NO MATCH'
--NO MATCH
Then it dawned on me. Importing data often leads to characters that you cannot see the difference in but in SQL will appear different.
So I did a simple test by removing all spaces in the IF statement comparison e.g
So I just checked each space character in the two words with ASCII() function and found that the last space between W and Mullins was different from each other.
Outputting both character numbers with CHAR in a select statement just gave me two columns of spaces. No way to tell the difference by sight, and it only occurred on one jockey for some reason.
IF 'MrPWMullins' = 'MrPWMullins'
PRINT 'MATCH'
ELSE
PRINT 'NO MATCH'
-- And the result was
MATCH
-- match the space between W and Mullins as it seems to be different
SELECT ASCII(' ') -- CHAR 160 another type of SPACE
SELECT ASCII(' ') -- CHAR 32 ASCII SPACE
160
32
Therefore I did an update on the main RACE_RUNNERS table to replace any CHAR(160) with CHAR(32) for this jockey and might have to put it in the main stored proc that either saves jockeys into the RACE_RUNNER table or just the stat proc.
Hey presto one instance of the name in the table.
A tricky issue that I thought was down to collation but it wasn't.
UPDATE RACE_RUNNERS
SET JOCKEY = REPLACE(Jockey,CHAR(160),CHAR(32))
WHERE JOCKEY = 'Mr P W Mullins'
The collations of table columns and the DB were the same and doing matches with a specified collation didn't solve the issue.
Therefore you have to think out of the box and go back to basics and just think what could be different. In this case it was the spaces. Removing them from both words created a MATCH result therefore I knew one space was a different character to the other.
Here is a link to an ASCII chart if you should ever want to see what character CHAR(N) returns > https://ascii.cl/.
Problem solved.
By Strictly-Software
© 2021 By Strictly-Software
No comments:
Post a Comment