Monday 24 November 2008

Trying to detect spoofed user-agents

User Agent Spoofing

A lot of traffic comes from browsers either masking their real identity by using a different user agent than the real one associated with the browser or a random string which relates to no known browser. The purpose of doing this is many fold from malicious users trying to mask their real identity to get round code that may ban on user-agents or get round client side code that may be blocking certain browsers from using certain functionality which is all a good reason for using object detection rather than browser sniffing when deciding which code branch to run.

However as you will probably know if you have tried doing anything apart from simple JavaScript coding there are still times when you need to know the browser because object detection just isn't feasible and trying to use object detection to work out the browser is just as bad in my opinion as using browser sniffing to work out the object.

Therefore when someone is using an agent switcher to mask the browsers agent and you come across one of these moments then it may cause you to run code that will raise errors. There is no foolproof way to spot whether an agent is spoofed but one of the things you can do if you do require this information is compare the agent with known objects that should be supported by that browser and if they don't match then you can confirm the spoof.

This form of spoof detection will only work if its only the user agent string that has been changed but an example of some of the checks you can do include for agents that say they are Opera make sure it supports window.opera as well as both event models document.addEventListener && document.attachEvent as far as I know its the only browser that does support both. For IE you shouldn't check document.all by itself as you will actually find Firefox will return true for this but you can check for window.ActiveXObject the non existence of addEventListener and use conditional comments to test for JScript. Firefox should obviously not support JScript as it uses Javascript.

Those are just a few checks you could do and you are basically using object detection as well as agent sniffing together to make sure they match. They may not tell you the real browser being masked but they can be used to tell you what its not. 

The idea of this is to make sure that in those cases where you have to branch on browser rather than object (see this previous article) that you make the right choice and don't cause errors. Obviously you may decide that if the user is going to spoof the agent then leave them to suffer any errors that may come their way.

If you do require a lightweight browser detector that checks for user agent spoofing amongst the main browsers as well as support for advanced CSS, Flash and other properties then see this article.

Tuesday 18 November 2008

Problems upgrading to Firefox 3

Invalid Null Pointer Error relating to DesignMode and ContentEditable

I finally upgraded to Firefox 3 the other day on my work PC but only after I managed to fix an annoying bug that had suddenly appeared in my CMS WYSIWYG editor when I had upgraded at home.

The bug was occurring whenever I closed my editor (display:none;) and then re-opened it again on the same page. I finally tracked down the bug and it was related to code that initialises the editor on opening where it sets designmode and contentEditable on. In Firefox 2 this didn't seem to cause any problems if these properties had previously been set but in FF3 it caused invalid null pointer errors. A quick check to make sure I wasn't trying to set something on that was already on fixed this problem e.g:

if(self.IE && theIframe.contentWindow.document.body.contentEditable!=true){
theIframe.contentWindow.document.body.contentEditable = true;
if(!self.IE && theIframe.contentWindow.document.designMode!="on" ){
//in moz you cannot change desgin mode straight away so try and then retry
theIframe.contentWindow.document.designMode = "on";
theIframe.contentWindow.document.execCommand("redo", false, null);
return false;

UserAgentButton is null Error in Firefox 3

Anyway I upgraded to Firefox 3 and then I started getting these "UserAgentButton is null" errors appear on my main site all over the place. The error in Firebug was referring to lines in my own code so it was quite confusing at first as I do not have any code that mentions UserAgentButton.

I disabled Firebug and still the problem persisted until I remembered I had installed another add on the other day that allows you switch useragent. The plug in is called "User Agent Switcher" version 0.6.11 and is very useful for testing sites. After disabling this add on the problem went away. So if you see a similar problem try disabling any add ons first before spending any time debugging your code.

Flash Not Playing Movies Correctly

Another problem I came across after upgrading to FF3 was with Flash movies not playing correctly. They would start playing but only for a few seconds before stopping and freezing.
The problem was definitely related to FF3 as they played correctly in IE and other browsers.
I read a lot of "fixes" for this problem on the web but the only thing that worked successfully
for me was to upgrade Flash to version 10.

I am not sure what other problems will come up and surprise me but the flash problem alone was reason enough not to make FF3 a level 1 supported browser on my main site. If anyone else has come across any issues relating to upgrading to Firefox 3 please let me know.

Tuesday 4 November 2008

Foreign Keys

In my ten years of database development I have come across a number of people who did not believe in using foreign key constraints in their database systems. Some of these people have actually been my senior at the time and its been very frustrating when I have asked why they are against enforcing referential integrity and they give replies along the lines of:

"When I was first starting out and didn't know any better I used to use foreign keys but now I know what I am doing so I let the application handle it all."

What it actually means is that they were taught by someone who thought they knew better than everyone else and unfortunately have not come to realise that rather than being an SQL guru their teacher was just another one of those people who thought they knew better than everyone else and they have foolishly followed in the same footsteps.

The way I see it is that the whole point of a is to enforce data integrity if you just want to persist data then there are probably better and faster ways but if you remove the relationships and constraints then you are just using your fancy and maybe expensive RDBMS as a storage system.

The analogy I use for is a car where the constraints are the safety features within the car i.e seat belts, airbags, ABS etc. You can build a car without all these safety features and it may run perfectly well but if you're unfortunate to have a crash you are going to be in a lot of trouble and you would wish that you hadn't removed those safety features for some imaginary performance gain. The data is your cars passenger and the constraints are there to ensure that the passenger arrives in one piece. If you accidentally bump into the car in front of you the air bag will prevent a broken nose just like if you have a bug in your code a check constraint will prevent dodgy data getting inserted into your table. You can live with bumps and bruises just like you can live with dodgy data however life is better without a bent nose and you will get less customer complaints when the data they report on is guaranteed to be accurate.

The reasons I have been given for not using foreign key constraints are usually one of the following two:

"It slows your system down and you should handle your data integrity checks in the application."

"It makes my life difficult when setting up sites and transferring data between systems due to all those referential key constraint errors when I try to insert data."

I will deal with each one in turn.

1. You are not going to be able to beat the RDBMS for speed at its own game. If you are going to be checking data for integrity then the best place to do it is as close to the data as possible. Therefore if you care about the quality of your data and want to check it let the RDBMS handle it. For example lets imagine a website that lets a user store a number of selected list options against their profile. Lets imagine we are going to let the application handle the integrity checks rather than the RDBMS and see what problems we may have.

Lets say the system has a table that holds the users who are on the website and allowed to save profiles.

CREATE TABLE USERS(UserPK int PRIMARY KEY, UserName varchar(50))

We also have a table that stores a number of locations which are displayed to the user on the profile page. The primary key is using a surrogate key which is an integer.

CREATE TABLE LOCATIONS(LocationPK int PRIMARY KEY, Location varchar(10))

The user can select multiple options so rather than hold the chosen Location value in the actual USERS table we have created a separate table that enables us to have a ONE to MANY relationship between an individual user and their selected locations.

LocationFK int,

As you can see we have created a table that has columns that reference the primary keys in both the USERS and LOCATIONS table but we have not enforced referential integrity by defining them as FOREIGN KEY constraints. I have however defined a PRIMARY KEY as this is still usually followed by the majority of people even if foreign keys are forgotten about or deliberately left off.

So we are going to let our application handle all data checks. Therefore we have a web page with a list that lets our USER choose the locations to save.

<select id="lstLocation" size="1" multiple="multiple">
<option value="1">Greater London</option>
<option value="2">South East</option>
<option value="3">South West</option>

Point 1. If the list only shows options that exist in the LOCATIONS table then we do not have to worry about invalid values being saved? WRONG! HTML can be manipulated hacked and HTTP requests forged. Anyone with a web developer toolbar could change this list and therefore the values that the form submitted.

Therefore we have a tampered POST request from the form that has supplied us with the following values: 454,676

We call our stored procedure that handles UPDATES/INSERTS to the USER_LOCATION table:

EXEC dbo.usp_save_user_locations @LocationIDs = '454,676';

Point 2. Within our stored procedure we only INSERT new values into the USER_LOCATION table if they exist in the LOCATIONS table by either doing a LEFT JOIN or an IF EXISTS statement. Yes we could do this but we are trying to carry out the same check that a FOREIGN KEY constraint would do anyway and at a higher level.

Point 3. We could use TRIGGERS to ensure that the data we want to insert is correct but this is the same as point 2. We are replicating behaviour best left to the RDBMS. Some people use TRIGGERS to handle CASCADING DELETES however the discussion of CASCADES is another topic that is unrelated to this basic discussion. In that case it may be a good idea as it would give you more control over the data you are removing. However any code written by the developer to replicate basic functionality that the RDBMS can handle is always open to coding errors.

Even if you decide that using the application is the way to go then you are still left with the problem of the management console. Who has access to your database server? Even if access to your system is limited to a few trusted people the fact remains that its possible to bypass any checks that your application may have by opening the table and editing/deleting the data by hand whether by accident or on purpose. Using FOREIGN KEY constraints will prevent this from occurring and this is the key point in my opinion going back to my car analogy. Its your safety belt / airbag preventing your data from getting crushed and mangled up in any database accident. Whether this accident happens on purpose or by a bug in your code its your
last line of defense and its the best kind you can get because the checks are as close to the actual data as possible and therefore cover all possible points of corruption.

The correct way to save all this trouble and potential pitfalls is to define our USER_LOCATIONS table with FOREIGN KEY relationships such as below


As you can see the primary key consists of both foreign keys. One relates to the LOCATIONS table and holds the ID for the chosen location. The other key relates to the USERS table and holds the ID of the user who made the selection.

We can test this by populating our tables with some test data.

SELECT 101, 'Mr Smith'

SELECT 1,'Greater London'
SELECT 2, 'South East'
SELECT 3, 'South West'

Now running the following will be okay as both a USER with a USERPK of 101 and a LOCATION with a LocationPK of 1 exists in the tables referenced by the FOREIGN KEY relationships.

SELECT 101, 1

However running the following SQL will result in an error being raised as neither the USER or LOCATION exist in the relevant tables with the corresponding values.

SELECT 102, 12

Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the FOREIGN KEY constraint "fk_USER_LOCATIONS_UseFK". The conflict occurred in database "XXX", table "dbo.USERS", column 'UserPK'.
The statement has been terminated.

As for making life difficult well its suppose to if by making life difficult you mean its preventing you from adding or deleting data willy nilly. You cannot TRUNCATE a table engaged in FOREIGN KEY constraints which may cause problems but then in a production environment why would you want to without taking into consideration related data. You can easily create scripts to drop all your constraints and then reapply them if you needed to do this. There are also easy ways to disable your constraints if you do not want to enforce checks when inserting data for example:

--Disable all table constraints

-- Enable all table constraints

-- Disable single constraint

-- Enable single constraint

If you're the DBA and giving me this argument then I would say that you're not a proper DBA in the first place if you don't understand the importance of data in a relational database. Also whatDBA worth the name does not have to hand or the ability to write a simple script to drop and reapply constraints. Surely this trouble is worth doing if it means your key business resource i.e the data is left intact and uncompromised. If you take away the relations all you have left are just isolated tables that persist data.

I will reminisce for a moment about a job I had some time back working for a telecoms company that specialised in predictive dialer software for call centers. My boss designed the database system and it was not in any way what you could call a normalised relational database. There was plenty of redundancy and lots of flat tables without a relationship in sight. His argument was that it was a real time database that had to process thousands of calculations and DML statements a second and therefore performance was the key issue. Having relations would have slowed down INSERTS and therefore the denormalised schema was appropriate to the system. My job was to develop the OLTP reporting and management systems and he used to joke about me enforcing referential integrity and how it was a rookies mistake that experts like himself did not make. However my other job at the company was to actually write the numerous reports for the call center customers that had to use the dialler data from his "unrelational" system. What a nightmare that was! The amount of hours I spent hunting for missing records trying to work out why I had data for one time slot in table A but no corresponding data in table B when it should be there. Yes his system may have been fast but because there was no defined relationships the data integrity was poor. I can understand wanting to make the system fast but having a well defined relational system should not mean the opposite.

Apart from the two reasons mentioned as reasons for not using referential integrity I think the main reason is purely laziness or just the fact that on a lot of sites you can get away with not doing it and no-one notices. A lot of sites are actually developed without RI and extra coding is carried out to handle the fact that a percentage of the data is either rubbish, orphaned records or duplicates due to not having defined constraints. Systems like this if they still care about the quality of the data will have scheduled jobs to clean up these records or they will have to spend extra time on reports like I had to do to accommodate the fact that you are reporting on data that you cannot guarantee to be 100%.

Remember the old saying "Rubbish in Rubbish Out" well if this saying was not invented to describe relational database design then it should have been done.

Further Reading

A helpful how to guide on Foreign Keys with useful SQL related to reporting on existing keys and disabling and enabling keys across databases.

A discussion on using foreign keys or triggers for referential integrity