Showing posts with label scripts. Show all posts
Showing posts with label scripts. Show all posts

Wednesday, 27 July 2011

My JavaScript HTML Encoder Function

Why did I write my JavaScript HTML Encoder object

One of my most popular scripts that continues to get downloaded a lot as well as spark many an email from companies and users is my HTML Encoder script that I wrote many a year back.

You can see the script in action here: www.strictly-software.com/htmlencoder

I was emailed another function that claimed to do the same work earlier and I just thought it would be interesting to other readers to know why I created the script in the first place and why many of the examples out on the net that claim to do the same job are just not up to scratch.

On many sites nowadays people are obtaining content from various sources through XML and RSS feeds and they often use JavasScript and AJAX to do part of the job. I know that when I was creating my football site www.hattrickheaven.com which was basically an exercise in using the Google AJAX API I came across the following problems:

  1. Content is obtained from a multitude of sources and there is no common standard that can be relied on 100%.
  2. Content is often mixed together through feed blenders like Yahoo Pipes or Blastcasta.
  3. Content is often loaded in on the fly using AJAX, a Scraper Proxy to do cross domain content jacks and other formatting duties such as on the fly translations etc.
  4. There is no built in function for HTML encoding in the JavaScript language.

Now I looked at many an HTML Encode solution before writing my own but the main problem I needed to overcome was that content could already be HTML encoded or partially HTML encoded and functions I came across did nothing to handle this problem.

For example if you had this bit of text you will notice that it has an encoded ampersand in the middle but the quotes are not encoded.

"Rob says hello & goodbye"

If you ran this through many of the HTML Encode functions out there including many server side ones it will double encode the & in the & so that you end up with this.

"Rob says hello & goodbye"

Now you may want this to happen but I doubt it.

Not if you want to run all your content through the same function without worrying about double encoding issues like this on your website.

This is why I wrote my HTML Encoder object as it puts into practise the technique I used whilst writing ASP classic sites that supported the UTF-8 character set.

With multi-lingual sites that display Arabic or Japanese character sets you don't want to be using the inbuilt Server.HTMLEncode function on all your textual input as it will triple the size of everything you store by converting every non ASCII character into a &#XXXX; encoded string.

When your database is set up to correctly store your text as Unicode (nvarchar etc) and you are outputting it with the correct code-base then you need something a bit more clever to do the job as you still need to encode characters that can cause you damage such as the naughty 4 " ' < > as well as the ampersand & to make your page validate (URL's, Querystrings etc).

This is obviously for security reasons as you don't want people to be able to malform your input and break your layout or insert XSS hack vectors into your system.

However running the standard HTML Encode function will encode everything as well as cause double encoding issues. Therefore the HTML Encoding needs to be done in stages and the ampersand has to be handled correctly as it makes up part of the HTML encoded &#XXXX; format.

So whilst it comes as quite a surprise to see how popular this script is (#1 on Google for "HTML Encode with Javascript") it is not really surprising when one thinks about the problem this script solves.

Whilst I don't claim to be any kind of coding genius and I am sure the code can be improved it does do what is says on the tin and if you are thinking of using another JavaScript function to HTML Encode your content then compare it first with my tool to see if it handles the double encoding issue as if it doesn't then you might find yourself having problems down the road.

Monday, 29 March 2010

My Hundredth Article

An overview of the last 102 articles

I really can't believe that I have managed to write 102 articles for this blog in the last year and a bit. When I first started the blog I only imagined writing the odd bit here and there and saw the site purely as a place to make public some of my more useful coding tips. I never imagined that I could output this amount of content by myself.

A hundred articles has come and gone pretty fast and as with all magazines, tv shows and bloggers stuck for an idea I thought I would celebrate my hundred and 2nd article by reviewing my work so far.

Recovering from an SQL Injection Attack

This was the article that started it all and it's one that still gets read quite a bit. It's a very detailed look at how to recover an infected system from an SQL Injection Attack and includes numerous ways of avoiding future attacks as well as quick sticking plasters, security tips and methods for cleaning up an infected database.

Linked to this article is one of my most downloaded SQL scripts which helps identify injected strings inside a database as well as removing them. This article was written after a large site at work was hacked and I was tasked with cleaning up the mess so it all comes from experience.

Performance Tuning Tips

I have wrote quite a few articles on performance tuning systems both client and server side and some of my earliest articles were on top tips for tuning SQL Databases and ASP Classic sites. As well as general tips which can be applied to any system I have also delved into more detail regarding specific SQL queries for tuning SQL 2005 databases.

Regarding network issues I also wrote an extensive how to guide on troubleshooting your PC and Internet connection which covered everything from TCP/IP settings to tips on the best tools for cleaning up your system and diagnosing issues. On top of that I collated a number of tweaks and configuration options which can speed up FireFox.


Dealing with Hackers, Spammers and Bad Bots

My job means that I have to deal with users trying to bring my systems down constantly and I have spent considerable time developing custom solutions to log, identify and automatically ban users that try to cause harm to my sites. Over the last year I have written about SQL Denial of Service attacks which involve users making use of web based search forms and long running queries to bring a database driven system to a halt. I have also investigated new hacking techniques such as the two stage injection technique, the case insensitive technique, methods of client side security and why its almost pointless as well as detailing bad bots such as Job Rapists and the 4 rules I employ when dealing with them.

I have also detailed the various methods of using CAPTCHA's as well as ways to prevent bots from stealing your content and bandwidth through hot linking by using ISAPI rewriting rules.

Issues with Browsers and Add-Ons

I have also tried to bring up to date information on the latest issues with browsers and new version releases and have covered problems and bugs related to major upgrades of Firefox, Chrome, Opera and IE. When IE 8 was released I was one of the first bloggers to detail the various browser and document modes as well as techniques for identifying them through Javascript.

I have also reported on current browser usage by revealing statistics taken from my network of 200+ large systems with regular updates every few months. This culminated in my Browser survey which I carried out over Christmas which looked at the browsers and add-ons that web developers themselves used.


Scripts, Tools, Downloads and Free Code

I have created a number of online tools, add-ons and scripts for download over the last year that range from C# to PHP and Javascript.

Downloadable Scripts Include:

SQL Scripts include:

Search Engine Optimisation

As well as writing about coding I also run a number of my own sites and have had to learn SEO the hard way. I have wrote about my experiences and the successful techniques I have found that worked in a couple of articles printed on the blog:
So there you go an overview of the last year or so of Strictly-Software's technical blog. Hopefully you have found the site a good resource and maybe even used one or two of the scripts I have posted. Let me know whether you have enjoyed the blog or not.

Saturday, 27 February 2010

The difference between a fast running script and a slow one

When logging errors is not a good idea

I like to log errors and debug messages when running scripts as they help diagnose problems not immediately apparent when the script is started. There is nothing worse than starting a script running and then coming into work on a Monday morning to find that it hasn't done what it was supposed to and not know why. A helpful log file with details of the operation that failed and the input parameter values can be worth its weight in gold.

However logging is an overhead and can literally be the difference between a script that takes days to run and minutes. I recently had a script to run on a number of webservers that had to segment out hundreds of thousands of files (documents in the < 500KB range) into a new folder structure.

My ideal solution was to collate a list of physical files in the folder I wanted to segment and pass that into my SELECT statement so that I could return a recordset containing only the files that actually existed. However my ideal solution was thwarted due to an error message I have never come across before which claimed the DB server didn't have the resources available to compile the necessary query plan. Apparently this was due to the complexity of my query and it recommended to reduce the amount of joins. As I only had one join this was not a solution and after a few more attempts with some covering indexes that also failed I tried another solution.

The solution was just to try to move the file and catch any error that was raised and then log it to a file. The script was set running and a day later it was still running along with a large file containing lots of "File not found errors".

The script was eventually killed for another unrelated reason which pissed me off as I had to get the segmented folder structure up running sharpish. I modified the script so that before each move it checked whether the File Existed before the move. I initially had thought that this in itself may have been an overhead as the actual move method would be doing the same thing only at a lower level therefore I was duplicating two searches for a file in a large folder. I reckoned that just trying to do the move and then catching any error would speed things up.

However because I was logging the error to a text file this was causing a bottleneck in I/O and slowing down the procedure immensely. Adding the FileExists check round each move ensured that only files that definitely existed were attempted to be moved and no errors were raised which meant no logging and no I/O overhead.

I had forgone the nicety of knowing which files were not on the server any more but I had also reduced the scripts running time down to a mere 25 minutes. Therefore the lesson to be learned is that although logging is useful it can also be a major overhead and if you can do without it you may just speed up your scripts.