Wednesday 18 March 2020

MS SQL Collation Issues

Fixing MS SQL Collation Differences

By Strictly-Software

With the quiet due to the COVID19 virus scare, and all major sporting events being cancelled for the foreseen future, I decided now would be a good time to get my old Automatic Betting BOT back up and working.

It used to run on a dedicated Windows 2003 server and MS SQL 2012 database before being outsourced to a French hosting company, OVH, which doesn't allow HTTP connections to online betting sites, basically making it useless. A decision that wasn't mine in anyway, but basically stopped my BOT from working.

It used to run as a Windows Service using Betfair's Exchange API to automatically create betting systems and then place bets based on systems with a rolling ROI > 5% over the last 30 and 100 days.

I am trying to get the BOT working on a local laptop and therefore it had already been backed up, the files FTP'd down to my laptop and restored in a local MS SQL Express database. 

However it seems that I had some issues when I previously attempted this some time ago as the database now had a different collation, a number of tables were now empty and had been scripted across without indexes and there was a collation difference between many columns and the new DB collation.

Just changing the Databases collation over isn't a simple act on it's own if numerous table column collations are different as well, especially those used in Indexes or Primary Keys as those references need to be removed before the collation can be changed.

A simple fix when it's only a small issue with one column might be to just edit the SQL where the issue arises and use a COLLATE statement so that any WHERE clause or JOIN uses the same collation e.g


SELECT  MemberID, MemberName
FROM MEMBERS 
WHERE MemberName COLLATE SQL_Latin1_General_CP1_CI_AI = @Name

However when you have a large database and the issue is that half your tables are one collation, the other another, and you need to decide which to move to, and then it becomes more difficult.

My new Database was using the newer SQL_Latin1_General_CP1_CI_AI collation whilst a subset of the tables were still using the older collation from the servers copy of the database Latin1_General_CI_AS.

Therefore finding out which columns and tables were using this collation was the first task and can easily be done with a system view with some SQL like so:

-- find out table name and columns using the collation Latin1_General_CI_AS
SELECT table_name, column_name, collation_name
FROM information_schema.columns
WHERE collation_name = 'Latin1_General_CI_AS'
ORDER BY table_name, column_name

From this I could see that the majority of my tables were using the older collation Latin1_General_CI_AS and that it would be easier to change the database collation to this, then the table collation.

However as there were very few indexes or keys I decided to do the reverse and use the newer collation SQL_Latin1_General_CP1_CI_AI, and change all columns using the LATIN to this new version.

However as I want to show you what to do if you need to change your MS SQL database and columns over to a new collation just imagine I am doing the opposite e.g I am changing my whole system from SQL_Latin1_General_CP1_CI_AI to Latin1_General_CI_AS.

If you did want to change the database collation after it had been created it is not as simple as just opening the databases property window and selecting a new collation OR just running the following query to ALTER the database. Just read the error message I was getting when attempting this.

USE master;
GO

ALTER DATABASE MyDatabase
COLLATE Latin1_General_CI_AS ;
GO

Msg 5030, Level 16, State 5, Line 18
The database could not be exclusively locked to perform the operation.

Msg 5072, Level 16, State 1, Line 18
ALTER DATABASE failed. The default collation of database 'MyDatabase' cannot be set to Latin1_General_CI_AS.

Apparently this locking issue is due to the fact that SSMS opens a second connection for Intellisense.

Therefore the correct way to do this, even if you are the only person using the database, as I was, is to put it into single user mode, carry out the ALTER statement, then put it back in multi user mode. #

This query does that.

ALTER DATABASE MYDATABASE SET SINGLE_USER WITH ROLLBACK IMMEDIATE

ALTER DATABASE MYDATABASE COLLATE Latin1_General_CI_AS 

ALTER DATABASE MYDATABASE  SET MULTI_USER

Once the database has been changed to the right collation you will need to change all the tables with columns using the differing collation. However any columns being used in Indexes or Keys will need to have their references removed first. Luckily for me I didn't have many indexes at this point.

However I still needed to script out any Indexes and Key Constraints I had and save them into an SQL file so that they could easily be re-created afterwards.

I then dropped all the Indexes and Keys that needed to be removed and then ran the following piece of SQL which outputs a load of ALTER TABLE statements.

You may find like I did, that you actually need to add a COLLATE statement into the WHERE clause to get it to work without erroring as the system tables themselves may have a different collation than the one you are wanting to search for.

SELECT 'ALTER TABLE ' + quotename(s.name) + '.' + quotename(o.name) + 
       ' ALTER COLUMN ' + quotename(c.name) + ' ' + type_name(c.system_type_id) 
    +CASE  
   WHEN c.max_length = -1 THEN '(max)'
   WHEN type_name(c.system_type_id)  = 'nvarchar' THEN '('+CONVERT(varchar(5),c.max_length/2)+')'
   ELSE '('+CONVERT(varchar(5),c.max_length)+')'   
  END + CASE WHEN c.[precision] = 0 THEN ' COLLATE '+c.collation_name   + ' ' ELSE ' ' END + LTRIM(IIF(c.is_nullable = 1, '', 'NOT ') + 'NULL ' )
FROM  sys.objects o
JOIN  sys.columns c ON o.object_id = c.object_id
JOIN  sys.schemas s ON o.schema_id = s.schema_id
WHERE o.type = 'U' 
  AND c.collation_name <> 'Latin1_General_CP1_CI_AI' -- the collation you want to change from
ORDER BY o.[name]

You should get a load of SQL statements that can be run in a new query window to change all your columns.

It is best to run the previous SQL outputting to a textual window so that the output can easily be copied and pasted.

ALTER TABLE [dbo].[TRAINER_PERFORMANCE] ALTER COLUMN [RatingType] varchar(20) COLLATE Latin1_General_CI_AS
ALTER TABLE [dbo].[CANCEL_KILLED] ALTER COLUMN [RecordDetails] nvarchar(max) COLLATE Latin1_General_CI_AS

Once you have run all these ALTER statements your whole database should be the collation that you require.

You can then take your copy of the Indexes and Keys that you created earlier, and run them to recreate any missing Indexes and Constraints. You should then be able to remove any quick fix WHERE clauses COLLATE statements that you may have used as a quick fix.

Collation differences are a right pain to resolve but if you do everything in order, and keep saved records of ALTER and CREATE statements that you need along the way it can be something fixed without too much work.

There are some large scripts to automate the process of dropping and re-creating objects if you want to use them however I cannot test to the reliability of these as I chose to do everything bit by bit, checking and researching along the way.

By Strictly-Software

No comments: