Sunday, 24 July 2011

TSQL Batch Updates SQL 2005 - 2008

Updating tables in Batches to prevent locking in SQL

There are times when you may need to carry out UPDATES on large tables that are in use and constantly being inserted, deleted or updated.

If you carry out a large UPDATE that affects all of the rows in the table then the table will be locked for the duration of the update and any other processes that may need to carry out DML statements will be BLOCKED from doing so.

You may even experience deadlocks but you will most definitely experience performance issues and if any SELECT statements that access the data don't use a WITH (NOLOCK) statement they too will have to wait in line for the UPDATE to finish.

Obviously wrapping WITH (NOLOCK) onto every SELECT statement is not a good solution unless you know what you are doing as it will provide dirty reads and you may end up giving your users old data. This might be fine for some scenarios but in critical applications where data integrity is key then you need another solution that provides data integrity and allows you to UPDATE the table without the performance problems.

When I find myself requiring the need to UPDATE every record in a large table I use a BATCH UPDATE process which cuts the large UPDATE statement down into lots of small UPDATES that affect only a few rows at a time.

For example instead of the whole table being locked for an hour with lots of blocked processes building up behind waiting for it to finish it would instead only be locked for lots of little time periods.

These smaller locking periods allow other process in to do their work and if the batch size is small enough and you have appropriate indexes you might find that you won't experience a full table lock anyway.

There are various methods for carrying out this approach and you should tailor your BATCH SIZE to your own requirements. Before SQL 2005 you could use the: SET NOCOUNT 50 command to set the size of the batch but in SQL 2005 and beyond you can use a variable directly with an UPDATE TOP (@VAR) command.


SET NOCOUNT ON
SET DATEFORMAT YMD

DECLARE @ROWS INT, @TOTALROWS INT, @BATCHSIZE INT

SELECT @ROWS = 1,
@TOTALROWS = 0,
@BATCHSIZE = 50

--now delete all data from main table that we have added in batches to prevent locks
WHILE @Rows > 0
BEGIN

--delete data in table in batches to prevent blocks
UPDATE TOP(@BATCHSIZE) MyTable
SET MyColumn = dbo.udf_SOME_FUNCTION(MyPK)
WHERE SomeDate > '2011-JAN-01'

SELECT @ROWS = @@ROWCOUNT, @TOTALROWS = @TOTALROWS + @ROWS


PRINT 'Updated ' + CAST(@ROWS as varchar) + ' in batch'

END

PRINT 'Updated ' + CAST(@TOTALROWS as varchar) + ' total rows'



I am currently using this process now to update a table that is constantly being accessed by a busy website that has over a million rows in it and it isn't causing any BLOCKING or performance issues at all.

2 comments:

Carlos said...

With the code u supply

SELECT @ROWS = @@ROWCOUNT, @TOTALROWS = @TOTALROWS + @ROWS
PRINT 'Updated ' + CAST(@ROWS as varchar) + ' in batch'

I would suggest the following changes to your post:

SET @ROWS = @@ROWCOUNT

WAITFOR DELAY '00:00:01'
declare @msg nvarchar(200)
set @msg = 'Updated ' + CAST(@ROWS as varchar) + ' in batch'
RAISERROR (@msg, 0, 1) WITH NOWAIT

The @@ROWCOUNT will only give a count of 1
when used as you have stated, at least in SQL 2008 it will.
So, it's best practice to "SET" the value to @ROWS seperately
not inconjunction with a compound SELECT statement.

That was giving a Val or 1 with SQL 2008

Also, instead of Print, I would use RaiseError.
This will give the immediate results, without
breaking your process...

R Reid said...

Hi Carlos

I take on board your comment about RAISEERROR but I think you are wrong about the comment you said re:

"The @@ROWCOUNT will only give a count of 1
when used as you have stated, at least in SQL 2008 it will.
So, it's best practice to "SET" the value to @ROWS seperately
not inconjunction with a compound SELECT statement."


As I have tested this in both SQL 2005 and SQL 2008 and with a PRINT statement within the loop showing a rolling count of both the current batch and total batch you get the correct numbers.

E.G if you edited the code to put a break after 150 rows e.g:


SELECT @ROWS = @@ROWCOUNT, @TOTALROWS = @TOTALROWS + @ROWS


PRINT 'Updated ' + CAST(@ROWS as varchar) + ' in batch'
PRINT 'Updated ' + CAST(@TOTALROWS as varchar) + ' so far'

IF @TOTALROWS > 100
BREAK


Then in both SQL 2005 and 2008 I get the following message output:


Updated 50 in batch
Updated 50 so far
Updated 50 in batch
Updated 100 so far
Updated 50 in batch
Updated 150 so far
Updated 150 total rows


So the joining of the @@ROWCOUNT doesn't have any effect on the variable @TotalRows.

The wait idea is good if you want to allow more room for other processes and have problems running batch updates but so far with proper indexing I haven't found a need for a WAIT statement yet. However it something to think about.

Thanks for commenting.