Showing posts with label DDL. Show all posts
Showing posts with label DDL. Show all posts

Wednesday, 18 March 2020

MS SQL Collation Issues

Fixing MS SQL Collation Differences

By Strictly-Software

With the quiet due to the COVID19 virus scare, and all major sporting events being cancelled for the foreseen future, I decided now would be a good time to get my old Automatic Betting BOT back up and working.

It used to run on a dedicated Windows 2003 server and MS SQL 2012 database before being outsourced to a French hosting company, OVH, which doesn't allow HTTP connections to online betting sites, basically making it useless. A decision that wasn't mine in anyway, but basically stopped my BOT from working.

It used to run as a Windows Service using Betfair's Exchange API to automatically create betting systems and then place bets based on systems with a rolling ROI > 5% over the last 30 and 100 days.

I am trying to get the BOT working on a local laptop and therefore it had already been backed up, the files FTP'd down to my laptop and restored in a local MS SQL Express database. 

However it seems that I had some issues when I previously attempted this some time ago as the database now had a different collation, a number of tables were now empty and had been scripted across without indexes and there was a collation difference between many columns and the new DB collation.

Just changing the Databases collation over isn't a simple act on it's own if numerous table column collations are different as well, especially those used in Indexes or Primary Keys as those references need to be removed before the collation can be changed.

A simple fix when it's only a small issue with one column might be to just edit the SQL where the issue arises and use a COLLATE statement so that any WHERE clause or JOIN uses the same collation e.g


SELECT  MemberID, MemberName
FROM MEMBERS 
WHERE MemberName COLLATE SQL_Latin1_General_CP1_CI_AI = @Name

However when you have a large database and the issue is that half your tables are one collation, the other another, and you need to decide which to move to, and then it becomes more difficult.

My new Database was using the newer SQL_Latin1_General_CP1_CI_AI collation whilst a subset of the tables were still using the older collation from the servers copy of the database Latin1_General_CI_AS.

Therefore finding out which columns and tables were using this collation was the first task and can easily be done with a system view with some SQL like so:

-- find out table name and columns using the collation Latin1_General_CI_AS
SELECT table_name, column_name, collation_name
FROM information_schema.columns
WHERE collation_name = 'Latin1_General_CI_AS'
ORDER BY table_name, column_name

From this I could see that the majority of my tables were using the older collation Latin1_General_CI_AS and that it would be easier to change the database collation to this, then the table collation.

However as there were very few indexes or keys I decided to do the reverse and use the newer collation SQL_Latin1_General_CP1_CI_AI, and change all columns using the LATIN to this new version.

However as I want to show you what to do if you need to change your MS SQL database and columns over to a new collation just imagine I am doing the opposite e.g I am changing my whole system from SQL_Latin1_General_CP1_CI_AI to Latin1_General_CI_AS.

If you did want to change the database collation after it had been created it is not as simple as just opening the databases property window and selecting a new collation OR just running the following query to ALTER the database. Just read the error message I was getting when attempting this.

USE master;
GO

ALTER DATABASE MyDatabase
COLLATE Latin1_General_CI_AS ;
GO

Msg 5030, Level 16, State 5, Line 18
The database could not be exclusively locked to perform the operation.

Msg 5072, Level 16, State 1, Line 18
ALTER DATABASE failed. The default collation of database 'MyDatabase' cannot be set to Latin1_General_CI_AS.

Apparently this locking issue is due to the fact that SSMS opens a second connection for Intellisense.

Therefore the correct way to do this, even if you are the only person using the database, as I was, is to put it into single user mode, carry out the ALTER statement, then put it back in multi user mode. #

This query does that.

ALTER DATABASE MYDATABASE SET SINGLE_USER WITH ROLLBACK IMMEDIATE

ALTER DATABASE MYDATABASE COLLATE Latin1_General_CI_AS 

ALTER DATABASE MYDATABASE  SET MULTI_USER

Once the database has been changed to the right collation you will need to change all the tables with columns using the differing collation. However any columns being used in Indexes or Keys will need to have their references removed first. Luckily for me I didn't have many indexes at this point.

However I still needed to script out any Indexes and Key Constraints I had and save them into an SQL file so that they could easily be re-created afterwards.

I then dropped all the Indexes and Keys that needed to be removed and then ran the following piece of SQL which outputs a load of ALTER TABLE statements.

You may find like I did, that you actually need to add a COLLATE statement into the WHERE clause to get it to work without erroring as the system tables themselves may have a different collation than the one you are wanting to search for.

SELECT 'ALTER TABLE ' + quotename(s.name) + '.' + quotename(o.name) + 
       ' ALTER COLUMN ' + quotename(c.name) + ' ' + type_name(c.system_type_id) 
    +CASE  
   WHEN c.max_length = -1 THEN '(max)'
   WHEN type_name(c.system_type_id)  = 'nvarchar' THEN '('+CONVERT(varchar(5),c.max_length/2)+')'
   ELSE '('+CONVERT(varchar(5),c.max_length)+')'   
  END + CASE WHEN c.[precision] = 0 THEN ' COLLATE '+c.collation_name   + ' ' ELSE ' ' END + LTRIM(IIF(c.is_nullable = 1, '', 'NOT ') + 'NULL ' )
FROM  sys.objects o
JOIN  sys.columns c ON o.object_id = c.object_id
JOIN  sys.schemas s ON o.schema_id = s.schema_id
WHERE o.type = 'U' 
  AND c.collation_name <> 'Latin1_General_CP1_CI_AI' -- the collation you want to change from
ORDER BY o.[name]

You should get a load of SQL statements that can be run in a new query window to change all your columns.

It is best to run the previous SQL outputting to a textual window so that the output can easily be copied and pasted.

ALTER TABLE [dbo].[TRAINER_PERFORMANCE] ALTER COLUMN [RatingType] varchar(20) COLLATE Latin1_General_CI_AS
ALTER TABLE [dbo].[CANCEL_KILLED] ALTER COLUMN [RecordDetails] nvarchar(max) COLLATE Latin1_General_CI_AS

Once you have run all these ALTER statements your whole database should be the collation that you require.

You can then take your copy of the Indexes and Keys that you created earlier, and run them to recreate any missing Indexes and Constraints. You should then be able to remove any quick fix WHERE clauses COLLATE statements that you may have used as a quick fix.

Collation differences are a right pain to resolve but if you do everything in order, and keep saved records of ALTER and CREATE statements that you need along the way it can be something fixed without too much work.

There are some large scripts to automate the process of dropping and re-creating objects if you want to use them however I cannot test to the reliability of these as I chose to do everything bit by bit, checking and researching along the way.

By Strictly-Software

Tuesday, 9 September 2008

SQL Performance Top Tips

Another SQL Performance Top Tips?

I know there are lots of Top Tips for improving SQL performance out there on the web but you can never have too much of a good thing so I have created my own list of issues to identify and resolve when trying to improve the performance of an SQL database.


1. Investigate and resolve dead locks or blocked processes.


All it takes is one blocked process on a commonly hit table for your whole site to hang and start reporting timeout issues.

On my jobboard sites I had locking issues with my main JOBS table that had to be joined to a number of other tables to get the data needed on results pages that were hit constantly by crawlers. These tables were also being updated constantly by clients and automated feeds from multi-posters which meant that locks were likely.

As I use one database for multiple sites it also meant that a lock caused by one site would cause timeout issues for the other sites.

To resolve this I created some de-normalised flat tables per site that contain all the fields needed for reporting and searching which meant:

  • The JOB table is only updated at regular intervals and not instantly. SQL 2005 introduced the synonym feature which I demonstrate here. This is used to swap between tables being built and tables in use by the site so that a new table can be built behind the scenes with up to date data.
  • Clustered indexes can be site specific depending on the columns the site search on the most.
  • No joins needed when accessing the table and results are returned very fast.
  • Any problems accessing this table do not affect other sites.
Another trick to handle instances where you are selecting from tables that maybe updated constantly such as job or banner hit tables is to look at the following:
  1. If the table you are selecting from is never updated or deleted from during the times you are selecting from then you can use the WITH (NOLOCK) statement on your SQL SELECTS. You won't have to worry about dirty reads as you are not actually updating the data and you bypass all the overhead SQL has to go through to maintain the LOCKING.
  2. Use the LOCK_TIMEOUT statement to reduce the time LOCKS are held for.
  3. Use the TRY / CATCH statement to catch deadlocks and other blocking errors in combination with a retry loop. After X retries return from the proc. View my example of using LOCK_TIMEOUT to handle DEADLOCKS and BLOCKING.



2. Look into large batch processes that have long run times, take up I/O, have long wait times or cause blocking.

A good example is where you have to transfer lots of records from one table to another e.g from a daily table that only receives INSERTS (e.g job / banner hits) into a historical table for reporting.

If you are deleting large numbers of records from a table that is also being used then you don't want to cause blocks that will freeze your site. In conjunction with point 1 you should look into how you handle your DELETE / UPDATE statements so that they are done in small BATCHES using the SQL 2005+ TOP command.

Read this article for an example of updating or deleting in batches to prevent blocking.

To find out problematic queries after the fact you can utilise the SQL 2005+ Data Management Views (DMV's) which hold key information about the system.

Read this article on my top queries for performance tuning to help identify those queries that need tuning.


3. Re-Index and De-Frag tables

Make sure you regularly re-index and de-fragment your tables as over time fragmentation will build up and performance will be affected. I tend to set up a weekly MS Agent job that runs a Defrag / Re-organize or  on tables with fragmentation over a set percentage as well as rebuild index statistics.

I then also try to schedule a full re-index of all my main tables once a quarter at scheduled down times as during a full rebuild the tables can be taken off line depending on your SQL setup.


4. Identify Slow Queries

Investigate your slowest running queries to make sure they are covered adequately by indexes but also try not to over use indexes. Try to cover as many queries with as few indexes as possible.

Try running this SQL performance report which will identify a dozen or so areas which could be improved on your system including:


  • Causes of the server waits
  • Databases using the most IO
  • Count of missing indexes, by database
  • Most important missing indexes
  • Unused Indexes
  • Most costly indexes (high maintenance)
  • Most used indexes
  • Most fragmented indexes
  • Most costly queries, by average IO
  • Most costly queries, by average CPU
  • Most costly CLR queries, by average CLR time
  • Most executed queries
  • Queries suffering most from blocking
  • Queries with the lowest plan reuse

This is an invaluable tool for any SQL DBA or SQL performance tuning developer.



5. Split tables

If your tables are used for inserting/updating as well as selection then for each index you have on a table that's an extra update required when a record is saved. On some of my big tables that are used heavily for reporting I split the data into daily and historical data. The daily table will allow updates but the historical table will not.

At night a job will transfer the daily data into the historical table, dropping all current indexes, populating the new data and then rebuilding the indexes with a 100% fill factor. You need to balance out whether speed on data retrieval or on update is more important if the table is used for both. You should also look into points 1 and 2 about managing blocking and batch updates/deletes to handle instances where a table is being deleted whilst accessed at the same time.


6. Re-write complex queries and make WHERE clauses SARGABLE

Can you rewrite complex queries to be more efficient. Can you remove left joins to large tables
by populating temporary tables first. Are you putting functions in the WHERE, HAVING clause on the column and negating any index usage. If so can you rewrite the clause and make it SARGABLE so that you make use of the index. For example a WHERE clause like so

WHERE DateDiff(day,CreateDate,GetDate()) = 0

Which is filtering by todays date should be rewritten to:


WHERE CreateDate > '2008-sep-09 00:00:00' --where this is the previous midnight

Put the value from GetDate() into a variable before the SELECT and then use that in the filter
so that no function is used and any index on CreateDate will be available.

Read this article of mine which proves the benefits of SARGABLE clauses in SQL.


7. Query Plan Caching

Check that you are benefiting from query plan caching. If you are using stored procedures that contain branching through use of IF statements that run different SQL depending on the values passed in by parameters then the cached query plan for this procedure may not be the best for all parameter variations.

You can either rewrite the procedure so that each IF branch calls another stored procedure that contains the query required as it will be this plan that gets used.

Or you could rewrite the query to use dynamic SQL so that you build up a string containing the appropriate syntax and parameters and then execute it using sp_executesql which will take advantage of plan re-use. Do not use EXEC as this will not take advantage of plan re-use and its not as safe in terms of sql injection.

You should also look into whether your query plans are not getting used due to your SQL Server settings as the default mode is to use SIMPLE PARAMETERIZATION and not FORCED PARAMETERIZATION.

This features takes AdHoc queries containing literal values and removes values replacing them with parameters. This means that the query plan which gets cached can be re-used for similar queries that have different values which can aid performance as it will reduce compilation time.

When the system is set to SIMPLE mode only AdHoc query plans that contain the same literal values get cached and re-used which in many cases is not good enough as the values for most queries will change all the time for example in SIMPLE mode only the plan for the exact query below will be cached.


SELECT *
FROM PRODUCTS
WHERE ProductID = 10

Which means only products with the ProductID of 10 will benefit from a cached plan however with FORCED PARAMETERIZATION enabled you would have plan with parameters so that any ProductID can benefit from it e.g


SELECT *
FROM PRODUCTS
WHERE ProductID = @1 -- name of SQL parameter


For a more detailed look at the benefits of this method read my article on forced paramaterization.


8. Use the appropriate table for the situation

SQL has a variety of tables from fixed permanent tables to global and local temporary tables to table variables that are both stored in tempdb.

I have found a number of times now that the use of table variables start off being used in stored procedures as very useful memory efficient storage mechanisms but once the datasets stored within them rises above some threshold (I have not found the exact threshold amount yet) the performance drops incredibly.

Whilst useful for array like behaviour within procs and quick for small datasets they should not be used with record sets of many thousands or millions of records. The reason being that no indexes can be added to them and any joins or lookups will result in table scans which are fine with 10-100 records but with 20 million will cause huge problems.

Swapping table variables for either fixed tables or temporary tables with indexes should be considered when the size of data is too great for a table variable.

If your procedures are being called by website logins with execute permission you will need to impersonate a login with higher privileges to allow for the DDL statements CREATE and DROP as a basic website login should have nothing more than read privileges to prevent SQL injection attacks.

If you don't want to risk the rise in privileges then consider using a fixed table and the use of a stamp and unique key so that multiple calls don't clash with each other. Appropriate indexes should be used to ensure speedy data retrieval from these tables.

A real world example I have found is explained in this article on speeding up batch processes that use table variables.


9. Tuning for MySQL

MS SQL has a vast array of useful tools to help you performance tune it from it's Data Management Views and Activity Monitor to it's detailed Query Execution Plans. However if you are using MySQL and have to suffer using a tool like NAVICAT then you are not in such a good position to tune your queries.

The EXPLAIN option is nowhere near as useful as the MS SQL Query Execution Plan but it can be used to ensure the appropriate indexes are added if missing and the Slow Query Log is a useful place to identify problematic queries.

If you are using LINUX or Wordpress and running your system on MySQL then read this article of mine on performance tuning for MySQL.


These are just some tips to look at when trying to improve back end database performance and I will add to this list when I get some more time. Feel free to post your own tips for identifying problematic SQL and then fixing it.