Showing posts with label indexes. Show all posts
Showing posts with label indexes. Show all posts

Wednesday, 18 March 2020

MS SQL Collation Issues

Fixing MS SQL Collation Differences

By Strictly-Software

With the quiet due to the COVID19 virus scare, and all major sporting events being cancelled for the foreseen future, I decided now would be a good time to get my old Automatic Betting BOT back up and working.

It used to run on a dedicated Windows 2003 server and MS SQL 2012 database before being outsourced to a French hosting company, OVH, which doesn't allow HTTP connections to online betting sites, basically making it useless. A decision that wasn't mine in anyway, but basically stopped my BOT from working.

It used to run as a Windows Service using Betfair's Exchange API to automatically create betting systems and then place bets based on systems with a rolling ROI > 5% over the last 30 and 100 days.

I am trying to get the BOT working on a local laptop and therefore it had already been backed up, the files FTP'd down to my laptop and restored in a local MS SQL Express database. 

However it seems that I had some issues when I previously attempted this some time ago as the database now had a different collation, a number of tables were now empty and had been scripted across without indexes and there was a collation difference between many columns and the new DB collation.

Just changing the Databases collation over isn't a simple act on it's own if numerous table column collations are different as well, especially those used in Indexes or Primary Keys as those references need to be removed before the collation can be changed.

A simple fix when it's only a small issue with one column might be to just edit the SQL where the issue arises and use a COLLATE statement so that any WHERE clause or JOIN uses the same collation e.g


SELECT  MemberID, MemberName
FROM MEMBERS 
WHERE MemberName COLLATE SQL_Latin1_General_CP1_CI_AI = @Name

However when you have a large database and the issue is that half your tables are one collation, the other another, and you need to decide which to move to, and then it becomes more difficult.

My new Database was using the newer SQL_Latin1_General_CP1_CI_AI collation whilst a subset of the tables were still using the older collation from the servers copy of the database Latin1_General_CI_AS.

Therefore finding out which columns and tables were using this collation was the first task and can easily be done with a system view with some SQL like so:

-- find out table name and columns using the collation Latin1_General_CI_AS
SELECT table_name, column_name, collation_name
FROM information_schema.columns
WHERE collation_name = 'Latin1_General_CI_AS'
ORDER BY table_name, column_name

From this I could see that the majority of my tables were using the older collation Latin1_General_CI_AS and that it would be easier to change the database collation to this, then the table collation.

However as there were very few indexes or keys I decided to do the reverse and use the newer collation SQL_Latin1_General_CP1_CI_AI, and change all columns using the LATIN to this new version.

However as I want to show you what to do if you need to change your MS SQL database and columns over to a new collation just imagine I am doing the opposite e.g I am changing my whole system from SQL_Latin1_General_CP1_CI_AI to Latin1_General_CI_AS.

If you did want to change the database collation after it had been created it is not as simple as just opening the databases property window and selecting a new collation OR just running the following query to ALTER the database. Just read the error message I was getting when attempting this.

USE master;
GO

ALTER DATABASE MyDatabase
COLLATE Latin1_General_CI_AS ;
GO

Msg 5030, Level 16, State 5, Line 18
The database could not be exclusively locked to perform the operation.

Msg 5072, Level 16, State 1, Line 18
ALTER DATABASE failed. The default collation of database 'MyDatabase' cannot be set to Latin1_General_CI_AS.

Apparently this locking issue is due to the fact that SSMS opens a second connection for Intellisense.

Therefore the correct way to do this, even if you are the only person using the database, as I was, is to put it into single user mode, carry out the ALTER statement, then put it back in multi user mode. #

This query does that.

ALTER DATABASE MYDATABASE SET SINGLE_USER WITH ROLLBACK IMMEDIATE

ALTER DATABASE MYDATABASE COLLATE Latin1_General_CI_AS 

ALTER DATABASE MYDATABASE  SET MULTI_USER

Once the database has been changed to the right collation you will need to change all the tables with columns using the differing collation. However any columns being used in Indexes or Keys will need to have their references removed first. Luckily for me I didn't have many indexes at this point.

However I still needed to script out any Indexes and Key Constraints I had and save them into an SQL file so that they could easily be re-created afterwards.

I then dropped all the Indexes and Keys that needed to be removed and then ran the following piece of SQL which outputs a load of ALTER TABLE statements.

You may find like I did, that you actually need to add a COLLATE statement into the WHERE clause to get it to work without erroring as the system tables themselves may have a different collation than the one you are wanting to search for.

SELECT 'ALTER TABLE ' + quotename(s.name) + '.' + quotename(o.name) + 
       ' ALTER COLUMN ' + quotename(c.name) + ' ' + type_name(c.system_type_id) 
    +CASE  
   WHEN c.max_length = -1 THEN '(max)'
   WHEN type_name(c.system_type_id)  = 'nvarchar' THEN '('+CONVERT(varchar(5),c.max_length/2)+')'
   ELSE '('+CONVERT(varchar(5),c.max_length)+')'   
  END + CASE WHEN c.[precision] = 0 THEN ' COLLATE '+c.collation_name   + ' ' ELSE ' ' END + LTRIM(IIF(c.is_nullable = 1, '', 'NOT ') + 'NULL ' )
FROM  sys.objects o
JOIN  sys.columns c ON o.object_id = c.object_id
JOIN  sys.schemas s ON o.schema_id = s.schema_id
WHERE o.type = 'U' 
  AND c.collation_name <> 'Latin1_General_CP1_CI_AI' -- the collation you want to change from
ORDER BY o.[name]

You should get a load of SQL statements that can be run in a new query window to change all your columns.

It is best to run the previous SQL outputting to a textual window so that the output can easily be copied and pasted.

ALTER TABLE [dbo].[TRAINER_PERFORMANCE] ALTER COLUMN [RatingType] varchar(20) COLLATE Latin1_General_CI_AS
ALTER TABLE [dbo].[CANCEL_KILLED] ALTER COLUMN [RecordDetails] nvarchar(max) COLLATE Latin1_General_CI_AS

Once you have run all these ALTER statements your whole database should be the collation that you require.

You can then take your copy of the Indexes and Keys that you created earlier, and run them to recreate any missing Indexes and Constraints. You should then be able to remove any quick fix WHERE clauses COLLATE statements that you may have used as a quick fix.

Collation differences are a right pain to resolve but if you do everything in order, and keep saved records of ALTER and CREATE statements that you need along the way it can be something fixed without too much work.

There are some large scripts to automate the process of dropping and re-creating objects if you want to use them however I cannot test to the reliability of these as I chose to do everything bit by bit, checking and researching along the way.

By Strictly-Software

Friday, 19 February 2016

Finding Text In Stored Procedures, User Defined Functions, Tables and Indexes

Finding Text In Stored Procedures, User Defined Functions, Tables and Indexes

By Strictly-Software

This is an update to an older stored procedure I had created that just looked inside the system view syscomments for a certain word.

The problem with this stored proc was:
  1. It used the old system views and we are now way past SQL 2000/2005.
  2. It would only look in Stored Procedures and User Defined Functions.
  3. It would provide mis-hits when the word was combined inside another word e.g if you were looking for the word Password and had the word PasswordHash inside the stored proc it would return that result.
  4. It ignored indexes which when you are trying to find columns to remove are obviously important.
  5. It carried out the conversion of the search word to a LTRIM(RTRIM(LOWER(@Word))) on every lookup when it could have been done once at the start.
So I have updated the code to take this in to fact.

It is still not the most efficient code due to the use of numerous LIKE statements but to ensure that you don't bring back invalid results the combination of clauses is required. 

You could use a CLR and write a C# regular expression to search for you but this is outside the scope of the article.

However to keep things simple I am just going to use the standard LIKE clause.

Also note that I have split the SELECT statements into two, one to look for occurrences of the word that is found inside stored procedures, UDF's, table columns and then another for indexes.

The code also uses the newer system views sys.objects, sys.syscomments, sys.all_columns, sys.indexes and sys.index_columns.



SET NOCOUNT ON

DECLARE @Word VARCHAR(100)

-- I am looking for the word Email, not the other columns I know exist such as UserEmail, Emailed, EmailSent etc
SELECT @Word = 'Email'

SELECT @Word = LTRIM(RTRIM(LOWER(@WORD)))

-- get columns in tables, and words inside stored procs and UDFs
SELECT DISTINCT COALESCE(c2.Name,o.NAME) AS [NAME], O.Name as [Object_Name],
  CASE [Type]
   WHEN 'P' THEN 'STORED PROC'
   WHEN 'FN' THEN 'UDF SCALAR'
   WHEN 'TF' THEN 'UDF TABLE'
   WHEN 'U' THEN 'TABLE'   
  END as Object_Type, Modify_Date
FROM SYS.OBJECTS as O
LEFT JOIN 
  SYS.SYSCOMMENTS as C
 ON C.ID = O.OBJECT_ID 
LEFT JOIN 
  SYS.ALL_COLUMNS as C2
 ON C2.OBJECT_ID = O.OBJECT_ID
WHERE 1=1
 AND O.[Type] IN('P','FN','TF','U')
 AND LOWER(COALESCE(c.Text,c2.Name)) LIKE '%' + @Word + '%'
 AND LOWER(COALESCE(c.Text,c2.Name)) NOT LIKE '%[A-Z0-9]' + @Word + '%'
 AND LOWER(COALESCE(c.Text,c2.Name)) NOT LIKE '%[A-Z0-9]' + @Word + '[A-Z0-9]%'
 AND LOWER(COALESCE(c.Text,c2.Name)) NOT LIKE '%' + @Word + '[A-Z0-9]%'
ORDER BY [Object_Name]

-- now return index columns
SELECT i.name AS Index_Name
  ,COL_NAME(ic.object_id,ic.column_id) AS Column_Name  
  ,CASE ic.is_included_column WHEN 0 THEN 'KEY COL' WHEN 1 THEN 'INCLUDED COL' END as Column_Type
  ,Modify_Date
FROM SYS.INDEXES AS i
JOIN SYS.INDEX_COLUMNS AS ic 
    ON i.object_id = ic.object_id AND i.index_id = ic.index_id
JOIN SYS.OBJECTS as O
 ON i.object_id = O.OBJECT_ID
WHERE LOWER(COL_NAME(ic.object_id,ic.column_id)) LIKE '%' + @Word + '%'
 AND LOWER(COL_NAME(ic.object_id,ic.column_id)) NOT LIKE '%[A-Z0-9]' + @Word + '%'
 AND LOWER(COL_NAME(ic.object_id,ic.column_id)) NOT LIKE '%[A-Z0-9]' + @Word + '[A-Z0-9]%'
 AND LOWER(COL_NAME(ic.object_id,ic.column_id)) NOT LIKE '%' + @Word + '[A-Z0-9]%'
ORDER BY Index_Name


Note the combination of the WHERE clauses to cover all the bases with the LIKE statements.

This is to ensure that:


  1. The word is inside the text (sys.syscomments) or the column name in the first place.
  2. The word is not at the end of another word e.g for email you don't want ClientEmail.
  3. The word is not in the middle of another word e.g CandEmailReset.
  4. The word is not at the end of another word e.g EmailsSent
If you had a CLR regular expression function then you could combine all these searches into one but I am keeping it simple with the LIKE statements for this article.



AND LOWER(COALESCE(c.Text,c2.Name)) LIKE '%' + @Word + '%'
AND LOWER(COALESCE(c.Text,c2.Name)) NOT LIKE '%[A-Z0-9]' + @Word + '%'
AND LOWER(COALESCE(c.Text,c2.Name)) NOT LIKE '%[A-Z0-9]' + @Word + '[A-Z0-9]%'
AND LOWER(COALESCE(c.Text,c2.Name)) NOT LIKE '%' + @Word + '[A-Z0-9]%'


This code will return results like the following.

The Stored Procedure / UDF / Table Results

NameObjectNameObject_TypeModify_Date
EmailClientsTABLE2015-02-12 12:13:09.100
Emailudf_validate_emailUDF SCALAR2016-02-12 12:13:09.100
Emailusp_net_get_user_detailsSTORED PROC2011-09-27 17:09:18.530


The Index Results

Index_NameColumn_NameColumn_TypeModify_Date
nclidx_USERS_EmailemailKEY_COL2016-02-12 11:18:19.130
nclidx_USERS_EmailemailINCLUDED_COL2015-12-12 12:10:11.130


So as you can see this is a much more useful piece of code for finding strings within a database.

Obviously if you have professional tools you should be able to use them but it's always good to know the nuts n bolts behind a system and the system views are a great way of finding out information that can be very useful to you.

Why would you use this piece of code?

Well I have used it for a number of reasons including.
  1. Finding certain words that needed replacing in stored procedures e.g when moving from 32bit to 64bit servers the ADO connection string changed and so did the provider and I needed to ensure all stored procedures had SET NOCOUNT ON at the top of them. This code allowed me to find all procedures that didn't have those words inside the procs with a tweak of the LIKE statements and highlighting stored procedures only.
  2. When we changed some column names I needed to find all occurrences of their use across the database, table columns, use in code and indexes.
  3. To find new columns that have been added and the date they were modified. Change the ORDER BY statement and you can find recently added columns of a certain name ordered by the date they were added.
  4. If your system has been hacked you may want to search the columns of tables for the injected string (if you know it) e.g <script src="//hack.ru"></script> and with some tweaked code which is on my main site www.strictly-software.com or this old article about finding text inside a database (SQL 2000 & 2005) you could find and clean up your system without backup/restore methods.


And those are just a few of the reasons I have found code like this useful.

I am sure you will find many more.

Let me know how you use it or would improve it.

By Strictly-Software

© 2016 Strictly-Software