We recently moved our production system to SQL 2008. Everything seemed okay until we realised after a fortnight that a particular MS Agent job had not been completing correctly.
One of the steps within the job had been failing due to a missing SQL Login. This job transferred large amounts of data from daily tables to historical ones and a large backlog had now been created numbering in the tens of millions.
Due to the use of our batch delete process that removed data that had been transferred in small chunks (500 rows) to reduce locking the data had got to a stage where new records were being added almost as fast as we were removing them.
When I looked at the query that was doing the insert and then delete I saw that it was using a temporary table variable to hold the temporary data whilst it was being transformed.As this was now 20 million plus records it was a problem on it's own due to it's limitations.
This is a common SQL Performance bottleneck and one I have come across a few times now.
SQL's TABLE variables are very good for small datasets (a few hundred at most) and are very useful for array like usage within stored procedures. However when you move to large amounts of records they just become a performance nightmare.
Unlike proper temporary tables or fixed permanent tables you cannot add indexes to TABLE variables and with large record sizes this is usually a must. The constant table scans the process must have been having to do to find rows must have been a major cause of the slow performance.
I rewrote the stored proc to make use of a temporary table, added a clustered index on the columns I was using for my joins to the real table and I used a batch DELETE process using the TOP (@X) statement to remove old records.
This has sped the process up immensely but the automatic creation of indexes and creation and dropping of tables requires permissions higher than those that the website login I use for my site has.
Therefore to ensure everything ran smoothly I had to use the WITH EXECUTE AS 'login' statement to allow the user to impersonate a login with higher privileges.
A cut down example of the proc is below and it shows the following:
- Impersonating logins with higher permissions to allow for DDL statements e.g CREATE and DROP.
- Checking TEMP DB for existing temporary tables and indexes and dropping them if they exist.
- Deleting large numbers of records in batches to prevent blocking.
- Ensuring a SARGABLE where clause is used instead of DATEDIFF to get yesterdays data
CREATE PROCEDURE [dbo].[usp_sql_job_data_transfer] @Success BIT = 0 OUTPUT, @RowsCopied INT = 0 OUTPUT, @RowsDeleted INT = 0 OUTPUT WITH EXECUTE AS 'admin_user' AS BEGIN SET NOCOUNT ON SET DATEFORMAT YMD DECLARE @Stamp datetime, @InsertDate datetime, @TmpRows int, @Rows int, @RowsInserted int, @error int -- check for existance of old temp table, a local temp table wont persist but we may need to use a global one (Example usage) IF object_id('tempdb..#JobHits') IS NOT NULL BEGIN -- drop it DROP TABLE #JobHits END -- create temp table CREATE TABLE #JobHits( JobFK int, Stamp Datetime, ViewHits int, SearchHits int ) -- job runs after midnight and we want a SARGABLE WHERE clause SELECT @Stamp = CONVERT(datetime,CONVERT(varchar,getdate(),23)) SELECT @TmpRows = 0, @RowsInserted = 0, @RowsDeleted = 0 -- insert into my temp table the search hits by date (SQL 2005) -- if this proves to be still slow or causing blocks then try inserting in batches of 1000 OR 20 min chunks INSERT INTO #JobHits (JobFK, Stamp, ViewHits, SearchHits) SELECT JobFK, CONVERT(datetime,CONVERT(varchar,Stamp,23)),0, count(jobFk) FROM JOBS_DATA with (nolock) WHERE HitType = 'S' AND Stamp < @Stamp GROUP BY JobFK, CONVERT(datetime,CONVERT(varchar,Stamp,23)) SELECT @TmpRows = @@ROWCOUNT, @Error = @@ERROR IF @Error <> 0 GOTO HANDLE_ERROR -- insert into my temp table view hits INSERT INTO #JobHits (JobFK, Stamp, ViewHits, SearchHits) SELECT JobFK, CONVERT(datetime,CONVERT(varchar,Stamp,23)),count(jobFk), 0 FROM JOBS_DATA with (nolock) WHERE HitType = 'V' AND Stamp < @Stamp -- SARGABLE WHERE CLAUSE - note I am not using DATEDIFF GROUP BY JobFK, CONVERT(datetime,CONVERT(varchar,Stamp,23)) SELECT @TmpRows = @TmpRows + @@ROWCOUNT, @Error = @@ERROR -- if an error occurred jump to the error handler IF @Error <> 0 GOTO HANDLE_ERROR --If no error but no rows then its just a bad day for the site with no hits ELSE IF @TmpRows = 0 GOTO EXIT_PROC ELSE BEGIN -- add an index to aid lookups and searching -- ensure no record exists already in Temp DB IF object_id('tempdb..clidx_#JobHits') IS NOT NULL BEGIN DROP INDEX clidx_#JobHits ON #JobHits END -- Add a clustered index to help searching - cover the main join column JobFk and group by column Stamp CREATE CLUSTERED INDEX [clidx_#JobHits] ON #JobHits ( [JobFK] ASC, [Stamp] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) ON [PRIMARY] END -- join my temp table to the main table to get info about the company the job belonged to INSERT INTO JOBS_DATA_HISTORY (JobFk,Stamp,ViewHits, SearchHits, ClientFK) SELECT a.JobFK, a.Stamp, Sum(a.ViewHits), Sum(a.SearchHits), j.ClientFK FROM #JobHits as a JOIN JOBS as j with (nolock) ON a.JobFk = j.JobPK GROUP BY a.JobFK, a.Stamp, j.ClientFk SELECT @RowsInserted = @@rowcount, @error = @@ERROR --if we are here then there must be > 0 rows otherwise we would have exited earlier IF @RowsInserted=0 or @Error<>0 GOTO HANDLE_ERROR -- Now we have copied our data we need to delete it all from the daily table -- as this table is being used by the site still we delete in batches to prevent blocking locks SELECT @Rows = 1 -- loop until no more rows are left WHILE @Rows > 0 BEGIN -- delete data in table using the TOP command to ensure we only delete on our indexed column in batches DELETE TOP(1000) FROM JOBS_DATA WHERE Stamp < @Stamp -- indexed column in JOBS_DATA SELECT @Rows = @@ROWCOUNT, @RowsDeleted = @RowsDeleted + @Rows, @Error = @@ERROR -- should i wait for a second in between batches? cannot find a definitive answer -- if problems occur (which they haven't so far) then try this --WAITFOR DELAY '00:00:01' IF @Error <> 0 GOTO HANDLE_ERROR END --if no rows were deleted then something went pete tong IF @RowsDeleted=0 GOTO HANDLE_ERROR END -- clean up CLEAN_UP: -- add an index to aid lookups and searching (not sure if when table is dropped the index record remains or not so being safe - check this!) IF object_id('tempdb..clidx_#JobHits') IS NOT NULL BEGIN -- drop our clustered index DROP INDEX clidx_#JobHits ON #JobHits END IF object_id('tempdb..#JobHits') IS NOT NULL BEGIN -- drop our temp table DROP TABLE #JobHits END -- jump over error handler and exit GOTO EXIT_PROC --handle error HANDLE_ERROR: SELECT @Success = 0 IF @FromProc = 0 SELECT @Success as success, COALESCE(@RowsCopied,0) as rowscopied,COALESCE(@RowsDeleted,0) as rowsDeleted -- exit RETURN 0 --handle a successful exit EXIT_PROC: SELECT @Success = 1, @RowsCopied = @RowsInserted -- exit RETURN 1