Tuesday, 21 October 2008

Dangers of Pattern Matching

Regular Expressions and Pattern Matching

Most modern systems use some form of pattern matching in there day to day running whether its client side Javascript replace or match statements, server side regular expression objects or database residing LIKE and PATINDEX statements and since SQL 2005 the ability to run CLR based regular expressions. Now although these are very useful tools to use you must be aware that its possible to max out your web or database servers CPU by running certain types of patterns.

I will tell you a short story about a very long weekend spent in the office at the beginning of 2007. We were migrating a big system to another server and we had copied all the code and database settings up and were watching the performance of the systems. Something strange was happening every minute or so. The CPU would jump from hardly anything up to 25%. After 30 seconds it would go back down again. Sometimes it would jump to 25% and then a few seconds later to 50%. In the daytime with all the sites running full throttle at intermittent stages throughout the day the CPU would jump in blocks of 25% up to 100% and all the sites on the server would become unusable causing lots of error messages and customer phone calls. We were at a loss to what was going on. It took quite a while before someone discovered that clicking on a link on a certain company profile on a certain site would always cause the CPU to spike. The site was a jobboard and the link was to display all the job results for that company. I copied down that sites job data to our dev box and ran the same test. Instead of causing the CPU to jump to 25% it maxed straight out to 100%. This made sense as our live server had a quad core. I debugged the code and found the cause was a regular expression that was in a function that re-formatted the snippet of job description shown on the results page. Although this function worked on 99% of the data in the system this company had a couple of jobs that contained certain Unicode characters and it was obviously causing the regular expression engine to freak out. We put it down to a problem inherit in Microsoft's Regular Expression engine and rewrote the code to break the one pattern down into 3 smaller steps which worked fine. However I then read an article about catastrophic backtracking and how certain patterns that can be matched in multiple ways can cause these sorts of CPU issues as the complexity of the pattern grows exponentially. If you have a very small pattern you may not notice it but if you have long strings then you can run into these CPU nightmares.

As well as problems that maybe built into your application due to patterns that could cause these high CPU issues when a certain replace or match is carried out in a specific instance you may be opening your site up to deliberate attacks by users that take advantage of an online searching tool that uses SQL and LIKE to search records in your database. If you are not careful and screen out wildcards and other symbols used within LIKEs pattern matching or regular expression symbols (+*^?$) if you are using a CLR build proc or function in SQL 2005 then you could be vulnerable to what is known as SQL Denial of Service attacks. Again its possible to max out your database servers CPU whilst a complex LIKE statement is run against your database.


Ways to avoid complex pattern matching affecting your system

  • Avoid backtracking.
  • If you have overly complex expressions consider breaking them down into multiple smaller expressions. Expressions that include lots of nested quantifiers ?*+.
  • Avoid expressions that can be matched by multiple string patterns.
  • If you are re-using an expression then compile it if possible.
  • If you are using an inherently useless regular expression engine (VB) then make sure you re-use your objects. Create a global regular expression object variable that's instantiated on first use and then re-used on subsequent calls and then destroyed at the end of the page.
  • If you have a page that is hit frequently that uses lots of expressions for formatting such as the job results example I mentioned then consider rewriting your code so that any complex formatting is done on input rather than output. Consider creating a separate column in your table that stores the formatted version which can be accessed quickly whenever it needs to be viewed. The performance gained by not having to run complex pattern matching constantly on each page load will be considerable.
  • If you are taking input from a user and using it in pattern matching in SQL LIKE or regular expression database searches then sanitise all input to make sure you do not fall victim to an SQL denial of service attack.
  • Also consider rewriting any database searching to use a full text index instead of LIKE searches. You will be able to offer your users a more feature rich and comprehensive search facility.

So now you know to be on the lookout for spikes in CPU. If you see the CPU in performance monitor jump up to 25% on a quad, 50% on dual or 100% in one leap then it could well be down to a regular expression pattern somewhere in your application that needs investigating.


Links

SQL Server Denial of Service Attack (LIKE Attack)

Detailed description of how backtracking can cause CPU issues

Catastrophic Backtracking

Tuesday, 7 October 2008

Is there any point anymore using Javascript to output Flash?

Using Javascript to load Flash.

Because of the well known Eolas patent issue regarding Internet Explorer and users having to click to activate Flash content a number of Javascript libraries has emerged to get round this problem e.g swfobject, UFO. The reason being that any Flash content loaded up through script was not affected by this patent issue and the user did not have to click anything.

I along with many others implemented generic functions to get round the "click to activate" problem. However since March of 2008 the issue has been resolved and current versions of Internet Explorer and any previous versions that have had patches applied to them do not have this issue. Therefore I am wondering whether there is any point in having Flash content delivered through Javascript rather than Server side code?

The main reason for doing it this way has been removed and by using Javascript to load all Flash you are narrowing your user base. More and more users are crawling the web with Javascript de-activated and I imagine that in this security conscious times we live in that this percentage will only increase. Checking my logged traffic statistics for today I can see that 10% of my sites non crawler users have Javascript disabled so this is quite a large number of users that won't be able to see all those Flash adverts and banners that your customers have paid £££ to place on your site.

Therefore in any new sites that I create I am reverting back to generating the Flash HTML server side unless there is a need for client side user interaction such as when I use flash for text counters. I am using the same XHTML that SWFobject and others have started doing which checks the users browser type and outputs the relevant OBJECT code accordingly i.e classid and codebase for IE and type for the others. This removes the need for embed tags and is still XHTML compliant plus its done server side so there is no issue when users turn Javascript off.

<object type="application/x-shockwave-flash" data="/images/Banner.swf" id="BANNER643" align="middle" width="468" height="60">
<param name="movie" value="/images/Banner.swf">
<param name="allowScriptAccess" value="sameDomain">
<param name="quality" value="high" />
<param name="wmode" value="transparent">
</object>


If the users browser is Internet Explorer I output a Javascript function call at the bottom of the page that will "fix" any OBJECT tags so that any users that don't have patched versions of IE and so still ask the user to "click to activate" will not have this problem as long as they have Javascript enabled.

A simple version of the EOLAS fix code is below although I am currently using a more advanced option that handles complex OBJECT params alot better.

objects = document.getElementsByTagName("object");for (var i = 0; i < objects.length; i++){objects[i].outerHTML = objects[i].outerHTML;}


All my users should get this content and only the unpatched Internet Explorer users who have Javascript disabled will have the inconvenience of being asked to "click to activate".

It would be nice to be able to detect whether Internet Explorer users had patched or unpatched versions so that I could only run the Javascript fix code if it was absolutely necessary but I cannot think of a way to determine this. I was thinking of maybe trying to access the first flash movie on a page and check the PercentLoaded property which must be unavailable if the flash movie is unactivated. However using the multiple IE version hack to run IE 6, 5 etc you always get a pop up saying "Press okay to continue loading content" when a page contains Flash so I cannot test this out. Either way this seems the best way to code Flash for the future as it ensures the maximum user coverage and still handles the EOLAS patent problem for those IE users who have not updated.

If anyone can think of any other reasons to be using Javascript and limiting your user coverage then please let me know.


SQL Denial of Service Attacks

Using SQL for Denial of Service (DOS) attacks

You should have heard about SQL injection attacks and most certainly DOS (Denial of Service) attacks but you may not have heard about SQL Denial of Service attacks as something to be on the look out when considering website security. The idea behind the attack is pretty simple and that is to use any search forms available on your website to get your SQL database to execute a number of long running and CPU intensive queries so that the site becomes unusable to other users. The reasons being either:
-The CPU on the database server is maxed out running these intensive queries.
-All connections to the database are inuse by the attacker and therefore no other connections can be spawned therefore locking up the site to other users.


Who is vulnerable to this form of attack?

Most websites have an SQL backend and the majority of sites have some user interface that allow visitors to enter textual keywords that are then used as criteria to filter an SQL SELECT statement to return matching records. A few examples of search forms are:
-Searching for content in a CMS system.
-Searching for jobs, members or properties.
-Searching for news articles or comments posted on a message board.

If you are using a database rather than a textual index to return these results you may be at risk if you are using the LIKE keyword or in SQL 2005 a CLR regular expression function to pattern match.

As an example I will use one of my jobboard sites and imagine that we used the database to handle keyword searches. We have a search criteria form that allows the user to enter search terms which we will try and match within the job description column of our JOB table using the LIKE keyword.

( N.B. The database is SQL 2005 running on Windows 2003 and the table in question has 646,681 rows. )

If like a lot of sites we treat gaps in the search term to indicate multiple terms that then get treated as an OR search then if the user had entered "ASP SQL" in the keywords box we would run a search using the following SQL:


SELECT TOP 10 JobPK,JobDescription
FROM JOBS as j
WHERE JobDescription LIKE '%ASP%'
OR JobDescription LIKE '%SQL%'
ORDER BY JobPK

This particular search returned me 10 rows worth of data in under 1 second.

Now if the user had entered something like this for their search keywords

_[^|?$%"*[(Z*m1_=]-%RT$)|[{34}\?_]||%TY-3(*.>?_!]_

Then the SQL will be


SELECT TOP 10 JobPK,JobDescription
FROM JOBS as j
WHERE JobDescription LIKE '%_[^|?$%"*[(Z*m1_=]-%RT$)|[{34}\?_]||%TY-3(*.>?_!]_%'
ORDER BY JobPK

This SQL when run returned 0 rows and took 2 minutes 12 seconds to return! This means that every single row in the database has been searched which is exactly the aim of the SQL DOS. Also during the execution of this one query the CPU on the SQL server was maxed out at 100% ! Whilst this one query was running other users of the system had problems connecting to the database server and I logged the following errors:



[DBNETLIB][ConnectionRead (recv()).]General network error. Check your network documentation.
[Microsoft][ODBC SQL Server Driver][DBNETLIB]ConnectionRead (recv()).[Microsoft][ODBC SQL Server Driver][DBNETLIB]ConnectionWrite (send()).


These errors are down to the SQL Server being so busy that it cannot open new TCP/IP ports.

This mayhem was due to 1 query causing the SQL server to max out. Now if you can imagine that the attacker could spawn multiple queries that consumed all available pooled connections each running a query taking 2+ minutes then you can see the scale of the problem. Any part of your website that used a database connection would be out of action for some time.


Changing effectiveness of the attack

The LIKE DOS attack can be modified in a number of ways:

  1. The longer the string being matched the longer the query execution time.
  2. Query plans get cached and re-used to increase performance so changing the pattern by a character or two each time will cause a recompile and prevent a cached plan being used.
  3. Combining patterns in an OR statement will increase the execution time. Make sure each pattern is different.
  4. Adding % around the search string will increase the effectiveness especially at the beginning of the string as it will prevent any available index from being used.

Solutions to the SQL DOS attack

In general you should always set a timeout limit for
-Connections (the length of time allowed for connecting to a database)
-Commands (the length of time allowed for a query/procedure to run)

The defaults for ADO are 30 seconds each.

If you must use a database LIKE search for your search forms then validate the user input in the following ways:
  • Strip all non alpha numerics from the input.
  • Set a maximum length for terms entered. If allowing long strings make sure there are an appropriate amount of spaces for the length e.g one space every 15 characters. If you have a string of 100 characters and only one space then something is wrong!!
  • If allowing non alpha characters then make sure characters such as % _ [ ] that are also used as wildcards in LIKE statements are either escaped or stripped out.
  • Also strip out concurrent non alphanumeric characters.
  • Make sure you have words that consist of 3 concurrent alphanumeric characters or more.

You could also try and prevent automated search requests by using one of the following:
  • Only allow logged in registered users to use the search form.
  • Use session variables to limit the number of searches a user can carry out in a set time period. If the user does not have sessions enabled prevent them from searching.
  • Use Javascript to run the search. Most bots will not be able to run script so the search will be disabled.
  • Use session variables again so that only one search at a time per user can be requested. Set a flag on search submission and also when the results are returned. If another search is requested within that time prevent it from executing. Again if sessions are disabled prevent searching.

A better option is to not use LIKE statements but use a full text index to handle searching. As well as being more secure due to the fact that symbols are treated as noise characters and therefore ignored you can offer your users a more feature rich search function. You can use SQL Servers inbuilt full text indexing or a 3rd party indexer such as DTSearch.

Setting up full text indexing on a table in SQL 2005 is very easy for example
  1. Make sure the Full Text Index service is installed on the server.
  2. Right click the table you wish to create an index for and select "Define Full-Text Index"
  3. Select the columns you wish to create the index on. In my jobboard example it will be the JobDescription column.
  4. Select how you want changes to the underlying data to be updated within the index. I chose "Automatic" as it means that whenever the data in the JOB table changes the index will get updated at the same time. You could set manually and then create a schedule to update the index.
  5. Give the full text catalog a name. I create a new catalog per table that needs an index. If you have multiple indexes that are quite small then you can share a catalog.
  6. If you have not chosen to automatically update the index when the data changes then create a schedule to do this for you.
  7. Create the index!

Once created you can access the data in a variety of ways by joining your index to your other database tables. However this is not an article on Full Text indexing so for a quick example that will replace our earlier search SQL that used LIKEs and OR's.


SELECT TOP 10 JobPK, JobDescription,Rank
FROM JOBS as j
JOIN FREETEXTTABLE(JOBS,JobDescription,'SQL ASP') as K
ON K.[KEY] = j.JobPK
ORDER BY Rank DESC


This query will search the index I just created for the search term "SQL ASP" and return the top 10 results ordered by the Rank value SQL gives each result depending on how close it matches the term. There are many features built into SQL Servers Full Text Indexing implementation which I won't cover here. However this serves as an example of how easy it is to convert a LIKE based search to an index based search.


Conclusion

As you can see it could be very easy depending on how you validate any search terms entered by your users for a malicious user or bot to create havoc by executing long running and CPU intensive queries against your database effectively causing your system to become unavailable for a period of time. Any part of your site that offers a search facility should be tested for vulnerabilities by entering various search terms and then measuring the effect. Although deadly at the time of execution these attacks are temporary and can be prevented with good application design and input validation.


Further Information

If you are wanting to create a rich Google like searching system for your site I suggest reading the following article which is very detailed:

http://www.sqlservercentral.com/articles/Full-Text+Search+(2008)/64248/

For more details about SQL Denial of Service attacks read the following article:

http://www.slideshare.net/fmavituna/dos-attacks-using-sql-wildcards/

For more information about pattern matching in general and the problems with regular expressions and CPU read my blog article:

http://blog.strictly-software.com/2008/10/dangers-of-pattern-matching.html

Thursday, 2 October 2008

Using Javascript to Parse a Querystring

Javascript has no support for querystring parsing

There maybe times when you need to parse a querystring using Javascript. I have recently found myself requiring to do this when rewriting a web application as lots of places were using inline Javascript script due to parameters being outputed into the HTML source through server side code. I wanted to move as much inline script out of the page source into external files and I found that a lot of cases I could only do this by using Javascript to access the querystring instead of server side code.

As Javascript has no inbuilt objects that you can reference you will either have to create your own one making use of the location.href and location.search objects which refer to the current URL and the querystring portion of the URL.

A lot of scripts will use string parsing to split the querystring up on & and = which is fine if you remember about the # anchor portion of the URL. However my script uses regular expressions to split the querystring up into its constituant parts.

I have created a little object that can be accessed on any page that references it. The features of this object are:

  • Parse the current location's querystring or supply your own querystring. It maybe that you have passed a URL encoded querystring as a value within the querystring and once you have accessed this parameter you need to parse it on its own.
  • Returns the number of parameters within the querystring.
  • Returns the parameters as key/value in an associative array.
  • Option to output the parameters as a formatted string.
  • Ability to access the anchor value seperatley if it exists.
  • Handles parameter values specified multiple times by comma seperating them.

The code:


function PageQuery(qry){

this.ParamValues = {};
this.ParamNo = 0;

var CurrentQuery, AnchorValue = "";

//if no querystring passed to constructor default to current location
if(qry && qry.length>0){
CurrentQuery = qry;
}else{
if(location.search.length>0){
CurrentQuery = location.href;
}else{
CurrentQuery = "";
}
}

//may want to parse a query that is not the current window.location
this.ParseQuery = function(qry){
var rex = /[?&]([^=]+)(?:=([^&#]*))?/g;
var rexa = /(\#.*$)/;
var qmatch, key, amatch, cnt=0;

//parse querystring storing key/values in the ParamValues associative array
while(qmatch = rex.exec(qry)){
key = denc(qmatch[1]);//get decoded key
val = denc(qmatch[2]);//get decoded value

if(this.ParamValues[key]){ //if we already have this key then update it if it has a value
if(key&&key!="") this.ParamValues[key] = this.ParamValues[key] + ","+val;
}else{
this.ParamValues[key] = val;
cnt++;
}
}
//as no length property with associative arrays
this.ParamNo = cnt;

//store anchor value if there is one
amatch = rexa.exec( qry );
if(amatch) AnchorValue = amatch[0].replace("#","");
}

//run function to parse querystring and store array of key/values and any anchor tag
if(CurrentQuery.length){
this.ParseQuery( CurrentQuery );
}

this.GetValue = function(key){ if(!this.ParamValues[key]) return ""; return this.ParamValues[key]; }
this.GetAnchor = AnchorValue;

// Output a string for display purposes
this.OutputParams = function(){
var Params = "";
if(this.ParamValues && this.ParamNo>0){
for(var key in this.ParamValues){
Params+= key + ": " + this.ParamValues[key] + "\n";
}
}
if(AnchorValue!="") Params+= "Anchor: " + AnchorValue + "\n";
return Params;
}
}

//Functions for encoding/decoding URL used in object

//encode
function enc(val){
if (typeof(encodeURIComponent)=="function"){
return encodeURIComponent(val);
}else{
return escape(val);
}
}
//decode
function denc(val){
if (typeof(decodeURIComponent)=="function"){
return decodeURIComponent(val);
}else{
return unescape(val);
}
}


How to call the code

To make use of the objects functions you can instantiate the PageQuery object by either passing the string that you want to parse as the constructors only parameter or you can pass nothing in which case it will default to the current location's querystring if there is one. Once you have created the object you can then reference the properties you require to return the relevant information.

//create new instance of the object
var qry = new PageQuery(); //defaults to current location if nothing passed to the constuctor
//var qry = new PageQuery("?id=1044&name=Rob+Reid#56"); //parse a specific string instead of using location.search
var id = qry.GetValue("id"); //get value for a parameter called id
var anc = qry.GetAnchor; //get the anchor # value if exists
var no = qry.ParamNo; //get the number of parameters
var s = qry.OutputParams();//return a formatted string for display purposes
var p = qry.ParamValues; //return the array of parameters

//loop through array of parameters
if(p){
for(var z in p){
alert("Query Parameter["+z+"] = " +p[z]);
}
}


So as you can see its a pretty simple but very flexible function that provides me with all the necessary functionality I require when handling querystrings with Javascript.

Click here to download the ParseQuery object script.