Showing posts with label Logging. Show all posts
Showing posts with label Logging. Show all posts

Sunday, 23 January 2011

Why I hate debugging in Internet Explorer 8

Debugging Javascript in Internet Explorer

Now with IE 8 and it's new developer toolbar debugging Javascript should have become a lot easier. There is now no need to load up Firebug Lite to get a console or use bookmarklets to inspect the DOM or view the generated source and in theory this is all well and good.

However in practise I have found IE 8 to be so unusable that I have literally stopped using it during development unless I really have to.

When I do find myself having to test some code to ensure it works in IE I have to make a little prayer to the God of Geekdom before opening it up because I know that within the next 5 minutes I will have found myself killing the IE process within Task Manager a couple of times at the very least.

Not only does IE 8 consume a large portion of all my CPU cycles it's very ill thought out event model makes debugging any kind of DOM manipulation a very slow painful chore.

Unlike proper browsers IE's event model is single threaded which means that only one event can occur at any point in time during a pages lifetime. This is why IE has the window.event object as it holds the current event being processed at any point in time.

Many developers over the years have moaned at IE for this and have hoped that with each new release of an IE browser that they would fix this odd behaviour. However every time a new version is rolled out a lot of web developers are always bitterly disappointed because apparently Microsoft feels this quirky event model is a design feature to be enjoyed rather than a bug to be suffered and they don't seem to have any intention whatsoever of fixing it or making it DOM 2 compliant at the least.

I don't know why they cannot do what Opera does and just implement both event models. At least that way they can make all the proper web developers happy at the same time as all the sado masochists who enjoy IE development

This event model really comes into its own when you are trying to debug something using the new developer toolbar as very often you want to pipe debug messages to the console.

If you are just outputting the odd message whenever a button is clicked or form loaded then this can be okay but if you attempt to do anything that involves fast moving events such as moving elements around the page or tracking fast incrementing counters then you will soon suffer the pain that comes with waiting many minutes for the console to catch up with the browsers action.

Whereas other browsers such as Chrome or Firefox are perfectly capable of outputting lots of debug messages to the console at the same time as mousemove events are being fired IE seems to either batch them all up and spit them out when the CPU drops below 50% (which might be never) or occasionally the whole browser will just crash.

At first I thought it was just me as my work PC is not the fastest machine in the office but I have seen this problem on many other peoples computers and I have also experienced it at home on my Sony VAIO laptop.

As a test I have created the following page which can be found here:


Try it out for yourself in a couple of browsers including IE 8 and see what you think. I haven't had the chance to test IE 9 yet so I don't know if the problem is the same with that version but I would be interested to know if anyone could test this for me.

The test is simple and just includes a mousemove event which collects the current mouse co-ordinates and pipes them to the console. It does this for a number of iterations which can be set with the input box.

I have found that IE will manage when setting this counter to anything less than 100 but putting it to 1000 or above it just crashes or freezes my own browser as well as killing the CPU.

Let me know what you think of this test and whether anyone else has major issues with IE and it's console logging capabilities.

Wednesday, 9 June 2010

SQL Injection attack from Googlebot

SQL Injection Hack By Googlebot Proxy

Earlier today on entering work I was faced with worried colleagues and angry customers who were complaining about Googlebot being banned from their site. I was tasked to finding out why.

First off all my large systems run with a custom built logger database that I created to help track visitors, page requests, traffic trends etc.

It also has a number of security features that constantly analyse recent traffic looking for signs of malicious intent such as spammers, scrapers and hackers.

If my system identifies a hacker it logs the details and bans the user. If a user comes to my site and its already in my banned table then it's met with a 403 error.

Today I found out that Googlebot had been hacking my site using known SQL Injection techniques.

The IP address was a legitimate Google IP coming from the 66.249 subnet and there were 20 or so records from one site in which SQL injection attack vectors had been passed in the querystring.

Why this has happened I do not know as an examination of the page in question found no trace of the logged links however I can think of a theoretical example which may explain it.

1. A malicious user has either created a page containing links to my site that contain SQL Injection attack vectors or has added content through a blog, message board or other form of user generated CMS that has not sanitised the input correctly.

2. This content has then been indexed by Google or even just appeared in a sitemap somewhere.

3. Googlebot has visited this content and crawled it following the links containing the attack vectors which have then been logged by site.

This "attack by SERP proxy" has left no trace of the actual attacker and the trail only leads back to Google who I cannot believe tried to hack me on purpose.

Therefore this is a very clever little trick as websites are rarely inclined to block the worlds foremost search engine from their site.

Therefore I was faced with the difficult choice of either adding this IP to my exception list of users never to block under any circumstance or blocking it from my site.

Obviously my sites database is secure and it's security policy is such that even if a hackbot found an exploitable hole updates couldn't be carried out by the websites login however this does not mean that in future an XSS attack vector could be created and then exploited.

Do I risk the wrath of customers and let my security system carry on doing it's job and block anyone trying to do my site harm even if its a Google by Proxy attack or do I risk a potential future attack by ignoring attacks coming from supposedly safe IP addresses?

Answer

The answer to the problem came from the now standard way of testing to see if a BOT really is a BOT. You can read about this on my page 4 Simple Rules Robots Won't Follow. It basically means doing a 2 step verification process to ensure the IP address that the BOT is crawling from belongs to the actual crawler and not someone else.

This method is also great if you have a table of IP/UserAgents that you whitelist but the BOT suddenly starts crawling from a new IP range. Without updating your table you need to make sure the BOT is really who they say they are.

Obviously it would be nice if Googlebot analysed all links before crawling them to ensure they are not hacking by proxy but then I cannot wait for them to do that.

I would be interested to know what other people think about this.

Saturday, 27 February 2010

The difference between a fast running script and a slow one

When logging errors is not a good idea

I like to log errors and debug messages when running scripts as they help diagnose problems not immediately apparent when the script is started. There is nothing worse than starting a script running and then coming into work on a Monday morning to find that it hasn't done what it was supposed to and not know why. A helpful log file with details of the operation that failed and the input parameter values can be worth its weight in gold.

However logging is an overhead and can literally be the difference between a script that takes days to run and minutes. I recently had a script to run on a number of webservers that had to segment out hundreds of thousands of files (documents in the < 500KB range) into a new folder structure.

My ideal solution was to collate a list of physical files in the folder I wanted to segment and pass that into my SELECT statement so that I could return a recordset containing only the files that actually existed. However my ideal solution was thwarted due to an error message I have never come across before which claimed the DB server didn't have the resources available to compile the necessary query plan. Apparently this was due to the complexity of my query and it recommended to reduce the amount of joins. As I only had one join this was not a solution and after a few more attempts with some covering indexes that also failed I tried another solution.

The solution was just to try to move the file and catch any error that was raised and then log it to a file. The script was set running and a day later it was still running along with a large file containing lots of "File not found errors".

The script was eventually killed for another unrelated reason which pissed me off as I had to get the segmented folder structure up running sharpish. I modified the script so that before each move it checked whether the File Existed before the move. I initially had thought that this in itself may have been an overhead as the actual move method would be doing the same thing only at a lower level therefore I was duplicating two searches for a file in a large folder. I reckoned that just trying to do the move and then catching any error would speed things up.

However because I was logging the error to a text file this was causing a bottleneck in I/O and slowing down the procedure immensely. Adding the FileExists check round each move ensured that only files that definitely existed were attempted to be moved and no errors were raised which meant no logging and no I/O overhead.

I had forgone the nicety of knowing which files were not on the server any more but I had also reduced the scripts running time down to a mere 25 minutes. Therefore the lesson to be learned is that although logging is useful it can also be a major overhead and if you can do without it you may just speed up your scripts.