Showing posts with label Collation. Show all posts
Showing posts with label Collation. Show all posts

Sunday, 25 April 2021

Solving an issue where you cannot tell the difference between the two conflicting words

Weird Issue - Two Names That Look The Same But Aren't

By Strictly-software 

I have a table of Jockeys that returns me all jockeys from a RACE_RUNNERS table but I had a problem which was I kept getting one jockey returned twice in the output. 

I couldn't work out why this would be, and only for one jockey, and jumped straight to collations. The column collation is set to DB default and when I do a compare using the same collation on each value they still don't match. e.g

-- Column in dbo.RACE_RUNNERS table is 

SELECT CONVERT (varchar, SERVERPROPERTY('collation')) AS 'Server Collation'; 

DB Collation = SQL_Latin1_General_CP1_CI_AS

SELECT	Jockey COLLATE SQL_Latin1_General_CP1_CI_AS,'JOCKEY',COUNT(RaceFK)
FROM	dbo.RACE_RUNNERS as run with (nolock)
JOIN	dbo.RACES as r with (nolock)
    ON	r.RacePK = run.RaceFK
WHERE	Jockey IN('Mr P W Mullins','Mr P W Mullins')
	AND Racedatetime > '2021-JAN-01'		
GROUP BY Jockey 
ORDER BY Jockey COLLATE SQL_Latin1_General_CP1_CI_AS

/*
Returns two records for the jockey that look alike
Mr W P Mullins
Mr W P Mullins
*/

-- do some basic comparison tests with each value

IF LTRIM(RTRIM('Mr P W Mullins')) = LTRIM(RTRIM('Mr P W Mullins'))
  PRINT 'MATCH'
ELSE
  PRINT 'NO MATCH'

--NO MATCH

IF LTRIM(RTRIM('Mr P W Mullins')) COLLATE SQL_Latin1_General_CP1_CI_AS = LTRIM(RTRIM('Mr P W Mullins')) COLLATE SQL_Latin1_General_CP1_CI_AS
  PRINT 'MATCH'
ELSE
  PRINT 'NO MATCH' 

--NO MATCH


So I was slightly confused and thinking about deleting the 2nd record for MR P W MULLINS from the table or changing his name first to the MR P W NULLINS with most results. 

Then it dawned on me. Importing data often leads to characters that you cannot see the difference in but in SQL will appear different. So I did a simple test by removing all spaces in the IF statement comparison e.g
IF 'MrPWMullins' = 'MrPWMullins'
  PRINT 'MATCH'
ELSE
  PRINT 'NO MATCH'

-- And the result was
MATCH
So I just checked each space character in the two words with ASCII() function and found that the last space between W and Mullins was different from each other.
-- match the space between W and Mullins as it seems to be different
SELECT ASCII(' ') -- CHAR 160 another type of SPACE
SELECT ASCII(' ') -- CHAR 32 ASCII SPACE

160
32

Outputting both character numbers with CHAR in a select statement just gave me two columns of spaces. No way to tell the difference by sight, and it only occurred on one jockey for some reason. 

Therefore I did an update on the main RACE_RUNNERS table to replace any CHAR(160) with CHAR(32) for this jockey and might have to put it in the main stored proc that either saves jockeys into the RACE_RUNNER table or just the stat proc.

UPDATE	RACE_RUNNERS 
SET	JOCKEY = REPLACE(Jockey,CHAR(160),CHAR(32))
WHERE	JOCKEY = 'Mr P W Mullins'
Hey presto one instance of the name in the table. A tricky issue that I thought was down to collation but it wasn't. 

The collations of table columns and the DB were the same and doing matches with a specified collation didn't solve the issue. 

Therefore you have to think out of the box and go back to basics and just think what could be different. In this case it was the spaces. Removing them from both words created a MATCH result therefore I knew one space was a different character to the other. 

Here is a link to an ASCII chart if you should ever want to see what character CHAR(N) returns > https://ascii.cl/.

Problem solved. 

By Strictly-Software

© 2021 By Strictly-Software

Wednesday, 18 March 2020

MS SQL Collation Issues

Fixing MS SQL Collation Differences

By Strictly-Software

With the quiet due to the COVID19 virus scare, and all major sporting events being cancelled for the foreseen future, I decided now would be a good time to get my old Automatic Betting BOT back up and working.

It used to run on a dedicated Windows 2003 server and MS SQL 2012 database before being outsourced to a French hosting company, OVH, which doesn't allow HTTP connections to online betting sites, basically making it useless. A decision that wasn't mine in anyway, but basically stopped my BOT from working.

It used to run as a Windows Service using Betfair's Exchange API to automatically create betting systems and then place bets based on systems with a rolling ROI > 5% over the last 30 and 100 days.

I am trying to get the BOT working on a local laptop and therefore it had already been backed up, the files FTP'd down to my laptop and restored in a local MS SQL Express database. 

However it seems that I had some issues when I previously attempted this some time ago as the database now had a different collation, a number of tables were now empty and had been scripted across without indexes and there was a collation difference between many columns and the new DB collation.

Just changing the Databases collation over isn't a simple act on it's own if numerous table column collations are different as well, especially those used in Indexes or Primary Keys as those references need to be removed before the collation can be changed.

A simple fix when it's only a small issue with one column might be to just edit the SQL where the issue arises and use a COLLATE statement so that any WHERE clause or JOIN uses the same collation e.g


SELECT  MemberID, MemberName
FROM MEMBERS 
WHERE MemberName COLLATE SQL_Latin1_General_CP1_CI_AI = @Name

However when you have a large database and the issue is that half your tables are one collation, the other another, and you need to decide which to move to, and then it becomes more difficult.

My new Database was using the newer SQL_Latin1_General_CP1_CI_AI collation whilst a subset of the tables were still using the older collation from the servers copy of the database Latin1_General_CI_AS.

Therefore finding out which columns and tables were using this collation was the first task and can easily be done with a system view with some SQL like so:

-- find out table name and columns using the collation Latin1_General_CI_AS
SELECT table_name, column_name, collation_name
FROM information_schema.columns
WHERE collation_name = 'Latin1_General_CI_AS'
ORDER BY table_name, column_name

From this I could see that the majority of my tables were using the older collation Latin1_General_CI_AS and that it would be easier to change the database collation to this, then the table collation.

However as there were very few indexes or keys I decided to do the reverse and use the newer collation SQL_Latin1_General_CP1_CI_AI, and change all columns using the LATIN to this new version.

However as I want to show you what to do if you need to change your MS SQL database and columns over to a new collation just imagine I am doing the opposite e.g I am changing my whole system from SQL_Latin1_General_CP1_CI_AI to Latin1_General_CI_AS.

If you did want to change the database collation after it had been created it is not as simple as just opening the databases property window and selecting a new collation OR just running the following query to ALTER the database. Just read the error message I was getting when attempting this.

USE master;
GO

ALTER DATABASE MyDatabase
COLLATE Latin1_General_CI_AS ;
GO

Msg 5030, Level 16, State 5, Line 18
The database could not be exclusively locked to perform the operation.

Msg 5072, Level 16, State 1, Line 18
ALTER DATABASE failed. The default collation of database 'MyDatabase' cannot be set to Latin1_General_CI_AS.

Apparently this locking issue is due to the fact that SSMS opens a second connection for Intellisense.

Therefore the correct way to do this, even if you are the only person using the database, as I was, is to put it into single user mode, carry out the ALTER statement, then put it back in multi user mode. #

This query does that.

ALTER DATABASE MYDATABASE SET SINGLE_USER WITH ROLLBACK IMMEDIATE

ALTER DATABASE MYDATABASE COLLATE Latin1_General_CI_AS 

ALTER DATABASE MYDATABASE  SET MULTI_USER

Once the database has been changed to the right collation you will need to change all the tables with columns using the differing collation. However any columns being used in Indexes or Keys will need to have their references removed first. Luckily for me I didn't have many indexes at this point.

However I still needed to script out any Indexes and Key Constraints I had and save them into an SQL file so that they could easily be re-created afterwards.

I then dropped all the Indexes and Keys that needed to be removed and then ran the following piece of SQL which outputs a load of ALTER TABLE statements.

You may find like I did, that you actually need to add a COLLATE statement into the WHERE clause to get it to work without erroring as the system tables themselves may have a different collation than the one you are wanting to search for.

SELECT 'ALTER TABLE ' + quotename(s.name) + '.' + quotename(o.name) + 
       ' ALTER COLUMN ' + quotename(c.name) + ' ' + type_name(c.system_type_id) 
    +CASE  
   WHEN c.max_length = -1 THEN '(max)'
   WHEN type_name(c.system_type_id)  = 'nvarchar' THEN '('+CONVERT(varchar(5),c.max_length/2)+')'
   ELSE '('+CONVERT(varchar(5),c.max_length)+')'   
  END + CASE WHEN c.[precision] = 0 THEN ' COLLATE '+c.collation_name   + ' ' ELSE ' ' END + LTRIM(IIF(c.is_nullable = 1, '', 'NOT ') + 'NULL ' )
FROM  sys.objects o
JOIN  sys.columns c ON o.object_id = c.object_id
JOIN  sys.schemas s ON o.schema_id = s.schema_id
WHERE o.type = 'U' 
  AND c.collation_name <> 'Latin1_General_CP1_CI_AI' -- the collation you want to change from
ORDER BY o.[name]

You should get a load of SQL statements that can be run in a new query window to change all your columns.

It is best to run the previous SQL outputting to a textual window so that the output can easily be copied and pasted.

ALTER TABLE [dbo].[TRAINER_PERFORMANCE] ALTER COLUMN [RatingType] varchar(20) COLLATE Latin1_General_CI_AS
ALTER TABLE [dbo].[CANCEL_KILLED] ALTER COLUMN [RecordDetails] nvarchar(max) COLLATE Latin1_General_CI_AS

Once you have run all these ALTER statements your whole database should be the collation that you require.

You can then take your copy of the Indexes and Keys that you created earlier, and run them to recreate any missing Indexes and Constraints. You should then be able to remove any quick fix WHERE clauses COLLATE statements that you may have used as a quick fix.

Collation differences are a right pain to resolve but if you do everything in order, and keep saved records of ALTER and CREATE statements that you need along the way it can be something fixed without too much work.

There are some large scripts to automate the process of dropping and re-creating objects if you want to use them however I cannot test to the reliability of these as I chose to do everything bit by bit, checking and researching along the way.

By Strictly-Software