Strictly Software

Tuesday, 23 September 2008

Latest SQL Injection URLS

Cleaning up a site infected with multiple SQL injected URLs

I have just had to clean up an ancient system that had been successfully hacked by automated hack bots. The site was a small news system that was visited rarely and written about 7 years ago. The code was ASP classic and the SQL was all client side and created using string concatenation with poor parameter sanitization and no thought paid at all to SQL injection methods. Luckily the site owner is moving to a new system this week however I still had to clean the database up and the main affected table contained at least 20 different script tags, some appearing over 5 times all referencing dodgy URIs. In fact by the looks of things the majority of the sites traffic over the last month was purely from hack bots which just goes to show that no matter how small a site is if it can be found on the web then a hackbot is going to try its luck. Luckily I managed to remove all traces of the hack using my clean up script and there was no need for a database backup restore.

However I thought it would be helpful to list out all the URI's injected into the system.
As you can see most are Russian with a few Chinese thrown in for good measure so nothing new there. They all caused Googles vulnerable site report to raise a flag and I believe the JS is the standard hack that makes use of the well known Iframe vulnerabilities in old browsers.

http://www0.douhunqn.cn/csrss/w.js
http://www.usaadp.com/ngg.js
http://www.bnsdrv.com/ngg.js
http://www.cdport.eu/ngg.js
http://www.movaddw.com/ngg.js
http://www.lodse.ru/ngg.js
http://www.sdkj.ru/ngg.js
http://www.kc43.ru/ngg.js
http://www.jex5.ru/ngg.js
http://www.bnrc.ru/ngg.js
http://www.bts5.ru/ngg.js
http://www.d5sg.ru/ngg.js
http://www.nemr.ru/ngg.js
http://www.kr92.ru/ngg.js
http://www.bjxt.ru/ngg.js
http://sdo.1000mg.cn/csrss/w.js
http://www.ujnc.ru/js.js
http://www.cnld.ru/js.js
http://www.juc8.ru/js.js
http://www.3njx.ru/js.js
http://www.19ssl.net/script.js
http://www.vtg43.ru/script.js

See my recovering from an SQL injection attack post for more details about clean ups and quick plasters that can be applied to prevent further injections.

Thursday, 18 September 2008

What browser support do you offer your users.

Graceful degradation

In an ideal world you would want any website you develop to have the largest audience as possible. Sites these days can rely on multiple technologies to work (Javascript, Flash, Java, ActiveX, CSS) most of which the user can disable through choice even when their browser supports it. This means that when it comes to developing a site you have the decision between drawing a line in the sand and saying you must use this browser above this version and have JavaScript, Flash and ActiveX enabled for it to work which will mean a small and very annoyed audience. Or the better way which is to build the site from bottom up starting with a good HTML core structure that should work in all browsers and then add layers of functionality on top.

This means making sure all links have href tags that point to URIs and not Javascript: , images have alt tags, client side validation is duplicated on the server (which is good practise anyway to prevent hackers avoiding it by disabling javascript) and that the core navigation is not reliant on Javascript to work. This is called graceful degradation and ensures that your sites core functionality is available to as many users as possible.

For example if you were displaying flash banners you would start with an image that had alt tags defined so that if the user has a text based browser they read the content within the alt tags. If they have image support they would see the image but if they had Flash enabled they would view the movie.

The site I am currently working on divides browser support up into 3 levels and to help the user we include a browser compatibility page that lets the user see what level of support their browser settings are providing them along with links to let them download the latest version of their browser if they are using an old version. As well as showing the browser support level (see below) the page lists a number of key settings and features that the site makes use of with an indicator of whether this feature is enabled or not. Settings displayed include Flash, JavaScript, AJAX, Java, Cookies etc. Having one or more items on this feature list unavailable does not mean the site will not function only that certain parts of the site use this technology and therefore to get the richest experience from the site and enjoy all the functionality they should enable those missing features. Obviously there are some items that have to be available for the site to work such as forms and session cookies and we let the user know whether their browser and settings are compatible with the site by showing them crosses or ticks next to each item and an overall compatibility rating.

Browser support levels

The browser support grades are defined in the following way.

Leval 1 Support
This means that the site should be fully functional and display correctly. We will test the site fully in level 1 browsers to make sure any bugs are fixed as soon as possible. Currently our level 1 browsers are IE 7,6, Firefox 2 (3 still buggy), Safari 3, Opera 9.

Level 2 Support
This means browsers that should work without any problem with our software but as we cannot test every single available browser we cannot guarantee that everything will work. For example if the browser is the latest version of SeaMonkey or Firebird which is based on the same Gecko rendering engine that Firefox is (which is a level 1 browser) then the system should work perfectly depending on your browser settings. However as we do not fully test
the system with this browser we do not guarantee it will work 100%. Current level 2 browsers would be Firefox 3, Opera 8, Sea Monkey, Konqueror.

Level 3 Support
This means older or niche browsers such as PDA/Mobile phones that we do not test the system on. This is not to say the system won't work as it might do depending on your browser settings. However we are not going to test our system on Netscape Navigator 4 or every mobile phone to ensure it works.

A fully functional site means no Javascript errors however the Javascript is not really a problem as most code that works in IE 7 is going to work in IE 5 and we all try to write cross browser code nowadays which is made easier with all the libraries available. The issue is more to do with CSS and all the quirks between IE/Mozilla and the various versions. IE 6 for instance has a number of major differences which means special stylesheets or hacks have to be implemented. As much as we would like to downgrade IE 6 to a level 2 browser we cannot due to the simple fact that a number of our largest clients still use IE 6 as their browser. We have enquired about the possibility of them upgrading but were actually told that their technical support team would not allow them to which is understandable if you consider how much support time would be taken up by requests of help because of the new browser layout. I can imagine don't want to suffer the hundreds of phone calls asking "where has the history menu disapeared to."

I also have Firefox 3 in level 2 due to the fact that there are still a number of outstanding issues with the site which need resolving before we would make it level 1. Some of these issues are definitely a problem with the browser such as the well known problem of playing Flash videos which an upgrade to version 10 is the only workaround I have found. There is also an issue with the execCommand function which I am not sure about as the code worked fine in Firefox 2 and other browsers but has suddenly started giving me Component returned failure code: 0x80004003 (NS_ERROR_INVALID_POINTER) [nsIDOMNSHTMLDocument.execCommand]" errors in an iframe based wysiwyg editor the site uses.

Current browser usage

I like to occasionally look at the traffic statistics to see whether our grading of browsers over 3 levels is in keeping with actual Internet use.

The system I am logging against is recording on average 350,000 page loads a day at the moment roughly 50-60% of which are from crawlers which is nothing out of the ordinary. So looking at the traffic from those site members who have logged into the website as opposed to just visitors the browser breakdown for this month so far is:

Browser %
IE 7.0 61.56
IE 6.0 27.21
Firefox 3.0 5.44
Firefox 2.0 3.06
Safari 3.1 2.72

Which is no great surprise and shows how much of a market share IE 6 still has even after 2 years of IE 7 being around.

As for obscure and old browser versions would you be surprised or not to know that for today only the following browsers appeared in the stats for all non-crawler traffic:

62 different people used IE 5.0 and 1 came along in IE 4!
5 visited on various versions of Netscape Navigator 4.
Overall users to the site Yandex the Russian search engine came third after IE 7 and 6.
There were over 100 different types of browser/version recorded today alone.

Although not earth shatteringly exciting it does show the breadth of browser types available and also that a large majority of the users out there do not seem upgrade very often, some not at all by the looks of things. With new versions of browsers rolling out all the time it would be physically impossible to test a site in each one but if your site works in the major 4 rendering engines (IE/trident, Firefox/Gecko, Safari/Webkit, Opera/Presto) then you will be covering the majority of those users who do keep up to date with browser revisions.

Sunday, 14 September 2008

My growing love for Javascript

A shaky start to the love affair

I have no shame in admitting that I am currently in love with JavaScript. It may be that for the last eon or two I have been working with clunky old scripting languages such as ASP classic and whenever I get a chance to do some JavaScript I grab it with open arms as an opportunity to do what seems like proper programming. I will be the first to admit that I never used to see JavaScript in the way I do now and as well as never understanding its full potential I never even thought of it as a proper object orientated language which it most certainly is. When I first swapped over from client / server applications in the early 90's to web development using ASP/COM/SQL Server 6 JavaScript was just a nice little scripting language that could set the focus of inputs, validate input client side and other little tweaks that didn't seem of much importance in the grand scheme of things. The "proper" coding was done server side and I would have traded my ciggies by the dozen with anyone to work on a stored procedure or COM component than have to fiddle round with a scripting language that I had little time or patience for.

Coming from a VB background I hated the problems of case sensitivity and having to remember that equality tests involved more than one equals sign, sometimes even three. Trying to debug a script was always a nightmare having to sit there clicking away all those alert boxes before realising you were in some sort of never ending loop and having to kill the browser and start again. Yes I really didn't think much of it and I am certainly not alone in feeling like that.

The love that grew from necessity.

So over the years my views on JavaScript changed from hatred to a mild respect still outweighed by all the annoyances that come with trying to write a fully functional script that is complex and cross browser at the same time. I still didn't have to work with it that much apart from replicating any server side form validation on the client and some mild DOM manipulation. My annoyance with the language itself had disappeared after I had learnt Java and C# and the scripts that I did have to knock out were not that complex however I still had an attitude that if it worked in the browser that I was using which was always the latest version of Internet Explorer then the script was fine by me. If it didn't work in Netscape or Safari then I would just ask the office "Javascript Guru" to have a look and the code I was given usually seemed to work even if I didn't know what it was doing. Then the other year I wanted to implement a WYSIWYG editor for the system I was working on. The system was currently using the FCKEditor and I wanted to implement what seemed like a simple request at the time a character counter so that as the user typed the number of characters used in the HTML source was available for viewing. I remember trying to edit the FCK source and realising what a huge beast it was. The size of its directory was ten times the size of the rest of the site. I was sure that half the code and possible functionality was not required for my systems requirements. I had a look at some other widgets including OpenWYSIWYG and another one that our Javascript guru had used and then I decided to write my own combining the best bits of each, stripping out anything not needed, adding my bullet counter and making it as flexible as possible. It seemed like a straight forward task on paper but it was the start of a painstaking development but more importantly it was the start of a long learning process which although extremely painful at the time opened my eyes to the wonders of cross browser coding and all the different caveats and pitfalls that were waiting for me to discover.

Items of interest discovered along the way.

Whilst developing this widget some of the most seemingly simple things turned out to be some of the most complex. Who would have thought just putting a cursor at the end of any content in the IFrame editor would be such a tall order. So as well as learning far too much about browser differences and the history of Netscape and Mozilla and why User-Agents seem to make little sense I found out some very important information and came across some specific problems that everyone comes across at some stage when developing cross browser script.

1. How Firefox is indeed a wonderful invention with all those extensions especially Firebug which made my debugging life pain free once again. Not only that but Firebug lite brings most of that joy to debugging in IE. No more tired fingers from closing alert buttons.

2. The problems with relative URIs displayed within Iframes in Internet Explorer. Read this article for an explanation. The solution was to write out the Iframe content with document.write.

3. Different implementations of content editable HTML between browsers. Issues setting design mode on in Mozilla and disappearing event listeners. All good fun for the clueless I can assure you.

4. All the fun involved in learning about the event model and the problem of "this" keyword in IE as well as the memory leakage in older IE versions and the illogical ordering IE fires events in.

5. Differences between browsers when trying to calculate the size of the viewport and window dimensions for making my editor appear in a floating div.

6. Trying to make the content outputted by the editor as consistent as possible across browsers and XHTML compliant. IE seems to love capital letters and forgetting to close LI and DT elements for some reason.

7. Much much more.

So as you can see if you have yourself covered all those topics in detail, which means you will most certainly have read Dead Edwards competition blog article from start to finish as well as follow most of the links it leads to, this is a lot of information to take in and understand. However rather than put me off JavaScript for life its actually made me come to love the bloody thing.

Conclusion

So whereas in the 90's I used to hate all those cross browser problems they are more of a challenge to be overcome now and I love it when I get a complicated piece of code working in the main 4 browsers as well as many old versions as possible. In fact I may get a little too keen sometimes and often need my colleagues to tell me that the widget doesn't actually need to work in Netscape Navigator 4 or IE 4.

I am one of those people who will readily admit that I don't know everything but I like finding out about those missing chunks of knowledge and when given the choice of an easy life by implementing someone else's code as is will now often choose the more painful but also more enjoyable option of trying to write my own version. I will have a look at some of the best examples out on the web and try to put them all together which is usually the best way of learning about all those little nice little cross browser intricacies on the way.

As the saying goes nothing worthwhile in life comes easily and this seems to be particularly true with writing cross browser JavaScript code.

Tuesday, 9 September 2008

SQL Performance Tuning Queries

SQL 2005 Performance Tuning Queries

Here are some of my favourite queries that help with SQL performance tuning.
I have gathered most of these from the web and 3rd party sources over the years with the odd tweak here and there if needed and they form the bulk of my performance tuning toolkit. You should have a file containing a number of similar queries always close at hand as they are invaluable in hunting down problematic queries or even SQL injections or DOS attacks.

Update: 07-Dec-08
In response to the comment about filtering by db_id() I have removed that filter from a few of the examples as he is correct in stating that filtering by db_id() will not return all the possible data and will miss out queries run from the client or from within stored procs that use dynamic sql. Also if you are in a hurry and want to skip to a very useful procedure that will output nearly 20 different performance related reports for one or more databases then go straight to this download of mine: Download SQL 2005 Performance Reports.

Top 50 worst queries for I/O

There are 2 different kinds of I/Os that are tracked within SQL Server: Logical and Physical IOs. Logical I/O account for data that is processed from the buffer pool which resides in memory, hence the phrase Logical I/O. Physical I/Os are I/Os that are associated with accessing data directly from the physical disks that SQL Server uses to store databases. Physical I/O’s are more expensive I/O’s, meaning they take longer to process. I/O is general the single most expensive operation that impacts the overall performance of a TSQL statement. So when you are tuning your queries you want to minimize the number of logical and physical I/O’s operation performed to produce a result set.

If you query shows physical I/O on the first run but none on the second its due to SQL holding the data in memory so you need to clear your buffer cache using DBCC DROPCLEANBUFFER


SELECT TOP 50
(qs.total_logical_reads + qs.total_logical_writes) /qs.execution_count as [Avg IO],
substring (qt.text,qs.statement_start_offset/2,
(case when qs.statement_end_offset = -1
then len(convert(nvarchar(max), qt.text)) * 2
else qs.statement_end_offset end - qs.statement_start_offset)/2)
as query_text,
qt.dbid,
qt.objectid
FROM    sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text (qs.sql_handle) as qt
ORDER BY [Avg IO] DESC

Top 50 worst queries by average running time

The following query will list the worst performing queries based on their average elapsed running time. It will give you the times, logical & physical reads and the TSQL of the statement in question.


SELECT TOP 50
creation_time,last_execution_time,total_physical_reads
  ,total_logical_reads,total_logical_writes,  execution_count
  ,total_worker_time, total_elapsed_time, total_elapsed_time  / execution_count
  avg_elapsed_time, SUBSTRING(st.text,  (qs.statement_start_offset/2) + 1,
  ((CASE statement_end_offset WHEN -1 THEN  DATALENGTH(st.text) ELSE      
  qs.statement_end_offset END -  qs.statement_start_offset)/2) + 1) AS statement_text
FROM   sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle)  st
ORDER BY total_elapsed_time / execution_count DESC;

This query is similar to the previous one in the DMVs (dynamic management views) it uses but it provides the most CPU intensive queries in a system. If you are seeing high CPU spikes or top end flatlining on your DB server then this query will help find out whether you are being attacked with SQL Denial of Service attacks or whether you have some queries that are using pattern matching techniques (LIKE, PATINDEX, CLR Regular Expressions) that have been given a complex string to search for that has caused the server to max out on CPU.


SELECT TOP 50 (a.total_worker_time/a.execution_count) AS [Avg_CPU_Time],
Convert(Varchar,Last_Execution_Time) AS 'Last_execution_Time',
Total_Physical_Reads,
SUBSTRING(b.text,a.statement_start_offset/2,
(case when a.statement_end_offset = -1 then len(convert(nvarchar(max), b.text)) * 2
else
a.statement_end_offset end - a.statement_start_offset)/2) AS Query_Text,
 db_name(b.dbid)as DatabaseName,
b.objectid AS 'Object_ID'
FROM sys.dm_exec_query_stats a
CROSS APPLY sys.dm_exec_sql_text(a.sql_handle) AS b
ORDER BY [Avg_CPU_Time] DESC

Suggested index columns and usage


SELECT d.*
 ,s.avg_total_user_cost
 ,s.avg_user_impact
 ,s.last_user_seek
 ,s.unique_compiles
FROM sys.dm_db_missing_index_group_stats s
,sys.dm_db_missing_index_groups g
,sys.dm_db_missing_index_details d
WHERE s.group_handle = g.index_group_handle
AND d.index_handle = g.index_handle
AND database_id = db_id()
ORDER BY s.avg_user_impact DESC

View index usage by table.

Ideally you want high figures in the user_seeks column.
If you have no seeks and lots of lookups then you should investigate. Also if you have indexes with no seeks, scans or lookups then they should be dropped as the index is not being used.


SELECT i.Name, i.Type_Desc, i.Is_unique, d.*
FROM    sys.dm_db_index_usage_stats as d
JOIN    sys.indexes as i
ON   i.object_id = d.Object_id and
  i.index_id = d.Index_id
WHERE database_id=db_id()
  AND d.object_Id=object_id('JOBS')

View the fragmentation levels of your In Row (non blob/text) indexes


SELECT object_name(d.object_id), d.object_id, d.index_id, i.name as IndexName,
  avg_fragmentation_in_percent, alloc_unit_type_desc
FROM    sys.dm_db_index_physical_stats (db_id(), NULL, NULL, NULL, NULL) as d
JOIN    sys.indexes as i
ON   i.object_id = d.object_id and
  d.index_id = i.index_id
WHERE   d.Index_Type_Desc <> 'HEAP' AND
  i.Name is not null AND
  avg_fragmentation_in_percent>= 30 --change this figure to reflect your own requirements
ORDER BY object_name(d.object_id)

Finds the degree of selectivity for a specific column in a row

The higher the percentage the more selective it is and an index should be considered. There is no point having indexes on columns that have low selectivity as it would not help differentiate between rows when 50% of rows are one value and 50% another. This is the reason why its not a good idea to place indexes on bit columns as there are only two possible values. The only exception to this would be when you have a very high percentage that have one value (99% true) and you want to find the 1% that have the opposite value (false) however even then I would advise against it.


Declare @total_unique float
Declare @total_rows float
Declare @selectivity_ratio float

SELECT @total_unique = 0
SELECT @total_rows = 0
SELECT @selectivity_ratio = 0

/*
Finds the Total Number of Unique Rows in a Table
Be sure to replace OrderID below with the name of your column
Be sure to replace [Order Details] below with your table name
*/
SELECT @total_unique = (SELECT COUNT(DISTINCT ClientFK) FROM JOBS)

/*
Calculates Total Number of Rows in Table
Be sure to replace [Order Details] below with your table name
*/
SELECT @total_rows = (SELECT COUNT(*) FROM JOBS)

--Calculates Selectivity Ratio for a Specific Column
SELECT @selectivity_ratio = ROUND((SELECT @total_unique/@total_rows),2,2)
SELECT @selectivity_ratio as 'Selectivity Ratio'

So those are a few of my favourite queries that I use very frequently to help hunt down and resolve back end performance bottlenecks. Since SQL 2005 came out with all these really cool DMVs a lot of people have spent time putting these sorts of queries together into one report.

If you want to get your hands on one of these all encompassing queries then check out the following stored procedure which uses a query I came across and extended to make it more customisable:

http://www.strictly-software.com/scripts/downloads/database_performance_script.txt

Its a beast of a report but gives you everything you need to know in one report. Read the comments before each section carefully so you understand what the data is showing you. Also if you run the report as it is on a server with lots of databases, tables, indexes etc it will take some considerable time to execute so I suggest using one of the modes that enables you to filter only on those databases you require statistics for. So call it like so:

EXEC dbo.usp_sql_rpt_database_performance_stats 3, 'strictlysoftware%'

will only report on databases on that server that have a name (sys.databases) that start with strictlysoftware. You will most likely also want to exclude the system tables (master, msdb, model) although you may want to report on tempdb so call it like so:

EXEC dbo.usp_sql_rpt_database_performance_stats 1, NULL

Another script that contains some useful code related to blocking and wait times can be found at Microsoft's site. The code is near the bottom of the page.

http://www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx

So there you go a one stop shop of goodies that should help you on your way. Please let me know if you have other useful DMV related queries or anything specific to SQL 2008 that is coming our way soon.

SQL Performance Top Tips

Another SQL Performance Top Tips?

I know there are lots of Top Tips for improving SQL performance out there on the web but you can never have too much of a good thing so I have created my own list of issues to identify and resolve when trying to improve the performance of an SQL database.

1. Investigate and resolve dead locks or blocked processes.

All it takes is one blocked process on a commonly hit table for your whole site to hang and start reporting timeout issues.

On my jobboard sites I had locking issues with my main JOBS table that had to be joined to a number of other tables to get the data needed on results pages that were hit constantly by crawlers. These tables were also being updated constantly by clients and automated feeds from multi-posters which meant that locks were likely.

As I use one database for multiple sites it also meant that a lock caused by one site would cause timeout issues for the other sites.

To resolve this I created some de-normalised flat tables per site that contain all the fields needed for reporting and searching which meant:

The JOB table is only updated at regular intervals and not instantly. SQL 2005 introduced the synonym feature which I demonstrate here. This is used to swap between tables being built and tables in use by the site so that a new table can be built behind the scenes with up to date data.
Clustered indexes can be site specific depending on the columns the site search on the most.
No joins needed when accessing the table and results are returned very fast.
Any problems accessing this table do not affect other sites.

Another trick to handle instances where you are selecting from tables that maybe updated constantly such as job or banner hit tables is to look at the following:

If the table you are selecting from is never updated or deleted from during the times you are selecting from then you can use the WITH (NOLOCK) statement on your SQL SELECTS. You won't have to worry about dirty reads as you are not actually updating the data and you bypass all the overhead SQL has to go through to maintain the LOCKING.
Use the LOCK_TIMEOUT statement to reduce the time LOCKS are held for.
Use the TRY / CATCH statement to catch deadlocks and other blocking errors in combination with a retry loop. After X retries return from the proc. View my example of using LOCK_TIMEOUT to handle DEADLOCKS and BLOCKING.

2. Look into large batch processes that have long run times, take up I/O, have long wait times or cause blocking.

A good example is where you have to transfer lots of records from one table to another e.g from a daily table that only receives INSERTS (e.g job / banner hits) into a historical table for reporting.

If you are deleting large numbers of records from a table that is also being used then you don't want to cause blocks that will freeze your site. In conjunction with point 1 you should look into how you handle your DELETE / UPDATE statements so that they are done in small BATCHES using the SQL 2005+ TOP command.

Read this article for an example of updating or deleting in batches to prevent blocking.

To find out problematic queries after the fact you can utilise the SQL 2005+ Data Management Views (DMV's) which hold key information about the system.

Read this article on my top queries for performance tuning to help identify those queries that need tuning.

3. Re-Index and De-Frag tables

Make sure you regularly re-index and de-fragment your tables as over time fragmentation will build up and performance will be affected. I tend to set up a weekly MS Agent job that runs a Defrag / Re-organize or on tables with fragmentation over a set percentage as well as rebuild index statistics.

I then also try to schedule a full re-index of all my main tables once a quarter at scheduled down times as during a full rebuild the tables can be taken off line depending on your SQL setup.

4. Identify Slow Queries

Investigate your slowest running queries to make sure they are covered adequately by indexes but also try not to over use indexes. Try to cover as many queries with as few indexes as possible.

Try running this SQL performance report which will identify a dozen or so areas which could be improved on your system including:

Causes of the server waits
Databases using the most IO
Count of missing indexes, by database
Most important missing indexes
Unused Indexes
Most costly indexes (high maintenance)
Most used indexes
Most fragmented indexes
Most costly queries, by average IO
Most costly queries, by average CPU
Most costly CLR queries, by average CLR time
Most executed queries
Queries suffering most from blocking
Queries with the lowest plan reuse

This is an invaluable tool for any SQL DBA or SQL performance tuning developer.

5. Split tables

If your tables are used for inserting/updating as well as selection then for each index you have on a table that's an extra update required when a record is saved. On some of my big tables that are used heavily for reporting I split the data into daily and historical data. The daily table will allow updates but the historical table will not.

At night a job will transfer the daily data into the historical table, dropping all current indexes, populating the new data and then rebuilding the indexes with a 100% fill factor. You need to balance out whether speed on data retrieval or on update is more important if the table is used for both. You should also look into points 1 and 2 about managing blocking and batch updates/deletes to handle instances where a table is being deleted whilst accessed at the same time.

6. Re-write complex queries and make WHERE clauses SARGABLE

Can you rewrite complex queries to be more efficient. Can you remove left joins to large tables
by populating temporary tables first. Are you putting functions in the WHERE, HAVING clause on the column and negating any index usage. If so can you rewrite the clause and make it SARGABLE so that you make use of the index. For example a WHERE clause like so

WHERE DateDiff(day,CreateDate,GetDate()) = 0

Which is filtering by todays date should be rewritten to:


WHERE CreateDate > '2008-sep-09 00:00:00' --where this is the previous midnight

Put the value from GetDate() into a variable before the SELECT and then use that in the filter
so that no function is used and any index on CreateDate will be available.

Read this article of mine which proves the benefits of SARGABLE clauses in SQL.

7. Query Plan Caching

Check that you are benefiting from query plan caching. If you are using stored procedures that contain branching through use of IF statements that run different SQL depending on the values passed in by parameters then the cached query plan for this procedure may not be the best for all parameter variations.

You can either rewrite the procedure so that each IF branch calls another stored procedure that contains the query required as it will be this plan that gets used.

Or you could rewrite the query to use dynamic SQL so that you build up a string containing the appropriate syntax and parameters and then execute it using sp_executesql which will take advantage of plan re-use. Do not use EXEC as this will not take advantage of plan re-use and its not as safe in terms of sql injection.

You should also look into whether your query plans are not getting used due to your SQL Server settings as the default mode is to use SIMPLE PARAMETERIZATION and not FORCED PARAMETERIZATION.

This features takes AdHoc queries containing literal values and removes values replacing them with parameters. This means that the query plan which gets cached can be re-used for similar queries that have different values which can aid performance as it will reduce compilation time.

When the system is set to SIMPLE mode only AdHoc query plans that contain the same literal values get cached and re-used which in many cases is not good enough as the values for most queries will change all the time for example in SIMPLE mode only the plan for the exact query below will be cached.


SELECT *
FROM PRODUCTS
WHERE ProductID = 10

Which means only products with the ProductID of 10 will benefit from a cached plan however with FORCED PARAMETERIZATION enabled you would have plan with parameters so that any ProductID can benefit from it e.g


SELECT *
FROM PRODUCTS
WHERE ProductID = @1 -- name of SQL parameter

For a more detailed look at the benefits of this method read my article on forced paramaterization.

8. Use the appropriate table for the situation

SQL has a variety of tables from fixed permanent tables to global and local temporary tables to table variables that are both stored in tempdb.

I have found a number of times now that the use of table variables start off being used in stored procedures as very useful memory efficient storage mechanisms but once the datasets stored within them rises above some threshold (I have not found the exact threshold amount yet) the performance drops incredibly.

Whilst useful for array like behaviour within procs and quick for small datasets they should not be used with record sets of many thousands or millions of records. The reason being that no indexes can be added to them and any joins or lookups will result in table scans which are fine with 10-100 records but with 20 million will cause huge problems.

Swapping table variables for either fixed tables or temporary tables with indexes should be considered when the size of data is too great for a table variable.

If your procedures are being called by website logins with execute permission you will need to impersonate a login with higher privileges to allow for the DDL statements CREATE and DROP as a basic website login should have nothing more than read privileges to prevent SQL injection attacks.

If you don't want to risk the rise in privileges then consider using a fixed table and the use of a stamp and unique key so that multiple calls don't clash with each other. Appropriate indexes should be used to ensure speedy data retrieval from these tables.

A real world example I have found is explained in this article on speeding up batch processes that use table variables.

9. Tuning for MySQL

MS SQL has a vast array of useful tools to help you performance tune it from it's Data Management Views and Activity Monitor to it's detailed Query Execution Plans. However if you are using MySQL and have to suffer using a tool like NAVICAT then you are not in such a good position to tune your queries.

The EXPLAIN option is nowhere near as useful as the MS SQL Query Execution Plan but it can be used to ensure the appropriate indexes are added if missing and the Slow Query Log is a useful place to identify problematic queries.

If you are using LINUX or Wordpress and running your system on MySQL then read this article of mine on performance tuning for MySQL.

These are just some tips to look at when trying to improve back end database performance and I will add to this list when I get some more time. Feel free to post your own tips for identifying problematic SQL and then fixing it.

Monday, 8 September 2008

Top 10 Tips for improving ASP performance

Top 10 Tips For Speeding up ASP Classic Sites

1. Look for nested loops and replace them with SQL stored procs. Its surprising how many times I have looked at sites that when outputting product hierarchies or menus on those old catalogue sites use nested loops to build up the content. Its usually people who got into programming on the web side with little or no SQL knowledge that was then learnt later only as and when they required somewhere to persist the data. Replace nested loops with SQL Stacks (see BOL) or in 2005 a recursive CTE. From tests I have found the Stack to perform slightly faster than the CTE but both will outperform nested loops. The data will most likely be in an adjacency table anyway so its just the method of outputting it. I remember replacing one such beast of nested loops with a stored procedure stack and it increased performance by a factor of 20.

2. Re-use those objects. Do you have helper functions that carry out regular expressions or file system object actions that instantiate the object within the function. If you call these functions
more than once on a page you are creating an unneccesary overhead. Have a global include file where you declare some often used variables. Then the first time you call the function if it doesn't exist initialise it otherwise use the existing object pointer. Then destroy them all in your global footer.

3. Simplify any complex regular expressions by breaking them down into multiple parts. I have had problems with badly written regular expressions causing CPU on our webserver to max at 100%. On our quad it would jump to 25%, four hits at the same time and 100% and then no other pages would load. Also complex expressions that do lots of lookbacks will cause overhead and the longer the string you are testing against the more overhead you will get. Remember VBs Regular Expression engine is not as good as some others so its better to break those long complex expressions up into smaller tests if possible.

4. String concatenation. In ASP.NET you have the string builder class but in ASP classic you should create your own using Arrays to store the string and then do a JOIN at the end to return it. Doing something like the following


For x = 1 to 100
strNew = strNew & strOtherString
Next

May work okay if the size of the string in strOtherString is small and you have few iterations but
once those strings get larger the amount of memory used to copy the existing string and then add it to the new one will grow exponentially.

5. Try to redim arrays as few times as possible especially if you are doing a Preserve. Its always best to dim to the exact size before hand but if you don't know how big your array will be then
dim it to a large number first and then use a counter to hold how many elements you actually add. Then at the end you redim preserve once back down to the number of elements you actually used. This uses a lot less memory than having to redim preserve on each addition to the array.

6. Use stored procedures for complex SQL and try to reduce database calls by calling multiple SQL together and then using NextRecordset() to move to the next set of records when you need to. Using stored procedures will also reduce network traffic as its much easier to connect to the DB once and pass through the names of 3 stored procedures than connect 3 times passing through the X lines of SELECT statements that each recordset requires.

strSQL = "EXEC dbo.usp_asp_get_records1; usp_asp_get_recorddetails; usp_asp_get_country_list;"

Set objRS = objConnection.Execute(strSQL)

If not(objRS.BOF AND objRS.EOF) Then
arrRecords = objRS.GetRows()
Set objRS = objRS.NextRecordset()
End If
If not(objRS.BOF AND objRS.EOF) Then
arrRecordDetails = objRS.GetRows()
Set objRS = objRS.NextRecordset()
End If
If not(objRS.BOF AND objRS.EOF) Then
arrCountryList = objRS.GetRows()
End If

7. Make sure that you don't put multiple conditions in IF statements. ASP doesn't support short circuiting which means it evaluates every condition in an IF statement even if the first one
is false. So rewrite code such as:


If intCount = 1 AND strName = "CBE" AND strDataType = "R" Then
'Do something
End If

If intCount = 1 Then
 If strName = "CBE" Then
    If strDataType = "R" Then
       'Do something
    End If
 End If
End If

8. Also make sure you evalutate conditions in the correct order if you are checking a value for True and False then you don't waste an extra check when you can just change the order of the conditions.


If Not(bVal) Then
 Response.Write("NO")
Else
 Respone.Write("YES")
End If

Should obviously be


If (bVal) Then
 Response.Write("YES")
Else
 Respone.Write("NO")
End If

8. Cache as much data as possible if it doesn't change frequently. If you have lists that come from a database but never change or very infrequently change then you could either store them in memory and reload every so often or write the array to a file and then use that static array and only re-create the file when the data changes. I always add an admin only function into my sites that on the press of a button will reload everything I need to into files and cache. This prevents un-required database lookups.

9. Pages that contain database content that's added through the website that only changes when the user saves that content can be created to static HTML files if possible. Rather than rebuilding the page from the DB each time someone views it you can use the static version. Only rebuild the page when the user updates the content. For example a lot of CMS systems store the content within the DB but create a static version of the page for quick loading.

10. Encode and format content on the way into the database not each time its outputted to be displayed. Its more cost efficent to do something once on the way in that hundreds of times on the way out.

These are just a few of the things I have found out that can be very beneficial in speeding up a classic ASP site on the web side of things. As well as any performance tweaks that can be done application side you should take a look at the database to make sure thats performing as well as possible. If your site is data driven then a lot of the performance will be related to how quick you can build those recordsets up and deliver them to the client.

Migrating IIS web application from 32 bit to 64 bit server.

Changing Servers from a 32 to 64 bit environment

By Strictly-Software

I recently had to go through the process of moving a website from a 32 bit windows to 64 bit windows server.

It was one of those tasks that I presumed would be pretty simple and not involve much work but turned into a proper nightmare.

I thought there might have been some sort of “gotcha” guide out on the web somewhere but if there was at the time I couldn’t find it!

I could either find bits and bobs about some individual problems or nothing at all. So I promised myself that on completion I would create an article myself detailing all of the major problems I had to overcome and the solutions I used. So here it is!

The original system was a Windows 2003 server hosting IIS 6.0 and running ASP classic sites.

We had got to the limits in terms of memory allocation and had boosted performance as much as we could in the application (see top 10 tips for boosting ASP classic site performance) therefore we wanted to try moving to a new 64 bit server before considering any large scale application rewrite to .NET.

1. COM objects and compiled dll libraries.

We found that most of our 3rd party COM objects needed upgrading to 64 bit.

You should check each object in turn to see if you need new installations or licences. Objects that we used that needed upgrading included:

ASPEmail
ASPJPEG
ASPUpload
DTSearchEngine
ISAPI_Rewrite

2. Running scripts.

Scheduled jobs (.bat, .cmd, .vbs) all need to be able to run in a 64 bit environment.

A 64 bit script cannot run a 32 bit process so if your script tries to instantiate 32 bit COM objects
you will get errors unless you upgrade those COM objects to 64 bit (see above).

Another option is to run the script in the c:\windows\sysWOW64\ directory with
cscript which enables you to run 32 bit scripts in a 64 bit environment.

3. ISAPI Rewrite

The name of the rewrite configuration file changes from httpd.ini to .htaccess and if you compare
the format of the files they differ in syntax e.g the flag [I] for "Ignore Case" becomes [NC] "No Case" and a 301 permanent redirect flag of [RP] becomes [R=301].

Also for Rewrite Rules the 64 bit version which is similar to that used on Apache servers requires the first rule to be wrapped in starting and closing tags e.g

# 32 BIT RULE

RewriteRule /some-old-page.htm http://www.newsite.com/newpage.htm [I,O,RP,L]

Becomes

#64 BIT RULE

RewriteRule ^/some-old-page.htm$ http://www.newsite.com/newpage.htm [NC,R=301,L]

There is a helpful conversion tool within IIS that appears in a new tab under each site to convert
your current files to the new format if you don't want to do it by hand.

4. ADO Connection Strings

If you are using an MDAC connection string to connect to an MS SQL database such as:

DRIVER={SQL Server}; SERVER=server_name_or_address;
DATABASE=database_name; UID=username; PWD=password;

You will need to change it to SQLOLEDB as MDAC is not supported on 64 bit windows
systems.

PROVIDER=SQLOLEDB; SERVER=server_name_or_address;
DATABASE=database_name; UID=username; PWD=password;

5. Issues related to changing to an SQLOLEDB connection string.

Changing to the SQLOLEDB provider will mean that you may have problems with SQL that returns multiple recordsets, either client side or server side SQL or stored procedures.

For example with the MDAC connection string, a client side SQL statement like the one below in an example piece of ASP code would work fine.

NOTE the two SELECT statements in the SQL and the objRS.NextRecordset statement against the resulting recordset.

strSQL = "DECLARE @vals varchar(2000); " &_
  "SELECT @vals = COALESCE(@vals + '''', '''', '''''''') + CAST(a.CategoryFK AS varchar(10)) " &_
  "FROM DATA_CATEGORIES_VALUES as a " &_
  "WHERE a.IDFK = 3556 AND " &_
  "a.DataTypeFK = 'JEB';" &_
  "SELECT @vals as val;"

Set objRS = objConnection.Execute(strSQL)

'* move to the next recordset as the first one just builds the string,
'* but it's the 2nd one that returns the values
Set objRS = objRS.NextRecordset

If Not(objRS.BOF AND objRS.EOF) Then
 val = objRS("val")
End If

However using the SQLOLEDB provider will result in an error such as

“Operation is not allowed when the object is closed”

when it tries to move to the next recordset.

I believe that this is caused because within the temporary stored procedures that are created to run client side SQL don't include a “SET NOCOUNT ON” statement which is the first recordset but this line is automatically included by the SQLOLEDB provider and therefore there is no need to move to the next recordset as only one is returned anyway.

Set objRS = objRS.NextRecordset

Using “SET NOCOUNT ON” means that the first recordset implicitly returned by SQL server containing the number of rows affected by the first SELECT statement is not returned.

This would be the case for any SELECT statement that does not itself return a dataset.

6. Stored Procedures and SQLOLEDB

Stored procedures which do not have the SQL statement “SET NOCOUNT ON” at the top of it's code block may also now cause problems if they return multiple recordsets with data and you may get “Operation is not allowed” or “Item does not exist with this name or ordinal” errors when trying to reference the recordset object in your client side code.

Including “SET NOCOUNT ON” at the top of stored procedures is good practise anyway and since SQL 2005 it's part of the default stored procedure template. Therefore any procs that don't include it should be updated to include it to prevent these sort of issues.

7. Returning BLOBS (varchar(max), nvarchar(max), text, ntext etc)

A benefit to changing to the SQLOLEDB provider is that you no longer have to put your BLOBS at the end of your SELECT statements (e.g nvarchar(max), varchar(max), ntext, text).

Previously you had to put these columns at the end of the SELECT statement after any non BLOB columns to prevent them from being returned empty e.g:

SELECT CandidateID, Name, DOB, Town, County, Blob1, Blob2, Blob3

With SQLOLEDB you do not have to do this as it seems to handle them better than MDAC so the following will return values for the BLOB columns even though they are not at the end of the SELECT.

SELECT CandID, Name, Blob1, Blob2, Blob3,  DOB, Town, County

These are the main configuration issues I found when moving to a 64 bit environment but please post any comments if there are other issues related to migrating IIS web applications from 32 to 64 bit platforms.

I am sure there are plenty more!

By Strictly-Software

© 2008 Strictly-Software