Wednesday, 27 July 2011

My JavaScript HTML Encoder Function

Why did I write my JavaScript HTML Encoder object

One of my most popular scripts that continues to get downloaded a lot as well as spark many an email from companies and users is my HTML Encoder script that I wrote many a year back.

You can see the script in action here: www.strictly-software.com/htmlencoder

I was emailed another function that claimed to do the same work earlier and I just thought it would be interesting to other readers to know why I created the script in the first place and why many of the examples out on the net that claim to do the same job are just not up to scratch.

On many sites nowadays people are obtaining content from various sources through XML and RSS feeds and they often use JavasScript and AJAX to do part of the job. I know that when I was creating my football site www.hattrickheaven.com which was basically an exercise in using the Google AJAX API I came across the following problems:

  1. Content is obtained from a multitude of sources and there is no common standard that can be relied on 100%.
  2. Content is often mixed together through feed blenders like Yahoo Pipes or Blastcasta.
  3. Content is often loaded in on the fly using AJAX, a Scraper Proxy to do cross domain content jacks and other formatting duties such as on the fly translations etc.
  4. There is no built in function for HTML encoding in the JavaScript language.

Now I looked at many an HTML Encode solution before writing my own but the main problem I needed to overcome was that content could already be HTML encoded or partially HTML encoded and functions I came across did nothing to handle this problem.

For example if you had this bit of text you will notice that it has an encoded ampersand in the middle but the quotes are not encoded.

"Rob says hello & goodbye"

If you ran this through many of the HTML Encode functions out there including many server side ones it will double encode the & in the & so that you end up with this.

"Rob says hello & goodbye"

Now you may want this to happen but I doubt it.

Not if you want to run all your content through the same function without worrying about double encoding issues like this on your website.

This is why I wrote my HTML Encoder object as it puts into practise the technique I used whilst writing ASP classic sites that supported the UTF-8 character set.

With multi-lingual sites that display Arabic or Japanese character sets you don't want to be using the inbuilt Server.HTMLEncode function on all your textual input as it will triple the size of everything you store by converting every non ASCII character into a &#XXXX; encoded string.

When your database is set up to correctly store your text as Unicode (nvarchar etc) and you are outputting it with the correct code-base then you need something a bit more clever to do the job as you still need to encode characters that can cause you damage such as the naughty 4 " ' < > as well as the ampersand & to make your page validate (URL's, Querystrings etc).

This is obviously for security reasons as you don't want people to be able to malform your input and break your layout or insert XSS hack vectors into your system.

However running the standard HTML Encode function will encode everything as well as cause double encoding issues. Therefore the HTML Encoding needs to be done in stages and the ampersand has to be handled correctly as it makes up part of the HTML encoded &#XXXX; format.

So whilst it comes as quite a surprise to see how popular this script is (#1 on Google for "HTML Encode with Javascript") it is not really surprising when one thinks about the problem this script solves.

Whilst I don't claim to be any kind of coding genius and I am sure the code can be improved it does do what is says on the tin and if you are thinking of using another JavaScript function to HTML Encode your content then compare it first with my tool to see if it handles the double encoding issue as if it doesn't then you might find yourself having problems down the road.

Tuesday, 26 July 2011

Wordpress posts_per_page Query Option Not Working

Fixing the posts_per_page query parameter bug in Carrington Themes

This is just a quick but annoying bug fix for anyone using the Carrington Theme for Wordpress.

If you are creating your own custom templates or pages and want to create your own custom query for the loop then you will probably want to specify how many articles to show.

This is determined by the posts_per_page query filter which determines how many articles are displayed during the loop output e.g

query_posts('cat=234&posts_per_page=5); 

However when I was trying to create a custom template earlier this value was being ignored and I couldn't work out why.

I eventually stumbled across a thread on the Wordpress forums that gave me the solution and it relates to a "feature" in the Carrington theme. Apparently it creates it's own function which ignores any value you may give the posts_per_page query option.

The solution is to disable this "feature" by removing the filter that sets it in motion. Once this is done your loop queries will take into account any value you give the posts_per_page query option.

You can disable this feature on the page you are building by adding the following code before creating your query loop.

remove_filter('pre_get_posts', 'cfct_posts_per_archive_page');


A full example of a custom template page based on the Carrington theme that uses this feature can be seen below.


<?php
/*
Template Name: SomethingTemplate
*/

if (__FILE__ == $_SERVER['SCRIPT_FILENAME']) { die(); }
if (CFCT_DEBUG) { cfct_banner(__FILE__); }

get_header();

?>

<div id="content">

<?php


// get category id by name - if you know it then skip this and add it directly to the query
$id = get_cat_id('Some Category');

// remove the carrington blog filter that overwrites the posts_per_page query filter parameter
remove_filter('pre_get_posts', 'cfct_posts_per_archive_page');

// create a query to filter by the desired category and show the number of posts as specified in the global settings and handle paging
query_posts('cat='.$id.'&posts_per_page='.get_option('posts_per_page') . '&paged='.$paged);


if (have_posts()) {
while (have_posts()) {
the_post();?>
<h1><a href="<?php the_permalink(); ?>" title="This is my blog article: <?php htmlspecialchars(the_title()) ?>"><?php the_title(); ?></a></h1>

<?php the_content(); ?>

<?php
}
}

?>

<div class="pagination">
<span class="previous"><?php previous_posts_link('« Previous') ?></span>
<span class="next"><?php next_posts_link('Next »') ?></span>
</div>

</div>

Sunday, 24 July 2011

TSQL Batch Updates SQL 2005 - 2008

Updating tables in Batches to prevent locking in SQL

There are times when you may need to carry out UPDATES on large tables that are in use and constantly being inserted, deleted or updated.

If you carry out a large UPDATE that affects all of the rows in the table then the table will be locked for the duration of the update and any other processes that may need to carry out DML statements will be BLOCKED from doing so.

You may even experience deadlocks but you will most definitely experience performance issues and if any SELECT statements that access the data don't use a WITH (NOLOCK) statement they too will have to wait in line for the UPDATE to finish.

Obviously wrapping WITH (NOLOCK) onto every SELECT statement is not a good solution unless you know what you are doing as it will provide dirty reads and you may end up giving your users old data. This might be fine for some scenarios but in critical applications where data integrity is key then you need another solution that provides data integrity and allows you to UPDATE the table without the performance problems.

When I find myself requiring the need to UPDATE every record in a large table I use a BATCH UPDATE process which cuts the large UPDATE statement down into lots of small UPDATES that affect only a few rows at a time.

For example instead of the whole table being locked for an hour with lots of blocked processes building up behind waiting for it to finish it would instead only be locked for lots of little time periods.

These smaller locking periods allow other process in to do their work and if the batch size is small enough and you have appropriate indexes you might find that you won't experience a full table lock anyway.

There are various methods for carrying out this approach and you should tailor your BATCH SIZE to your own requirements. Before SQL 2005 you could use the: SET NOCOUNT 50 command to set the size of the batch but in SQL 2005 and beyond you can use a variable directly with an UPDATE TOP (@VAR) command.


SET NOCOUNT ON
SET DATEFORMAT YMD

DECLARE @ROWS INT, @TOTALROWS INT, @BATCHSIZE INT

SELECT @ROWS = 1,
@TOTALROWS = 0,
@BATCHSIZE = 50

--now delete all data from main table that we have added in batches to prevent locks
WHILE @Rows > 0
BEGIN

--delete data in table in batches to prevent blocks
UPDATE TOP(@BATCHSIZE) MyTable
SET MyColumn = dbo.udf_SOME_FUNCTION(MyPK)
WHERE SomeDate > '2011-JAN-01'

SELECT @ROWS = @@ROWCOUNT, @TOTALROWS = @TOTALROWS + @ROWS


PRINT 'Updated ' + CAST(@ROWS as varchar) + ' in batch'

END

PRINT 'Updated ' + CAST(@TOTALROWS as varchar) + ' total rows'



I am currently using this process now to update a table that is constantly being accessed by a busy website that has over a million rows in it and it isn't causing any BLOCKING or performance issues at all.

Wednesday, 13 July 2011

PHP for obtaining the follower count of a Twitter Account

Get Twitter Follower Count using Regular Expressions

I came across this bit of PHP code the other day which is aimed at getting the follower count of a Twitter user.

It seems like overkill to me and is a mixture of regular expressions, string parsing, callback functions and a lot of head scratching.

The user obviously knows that the follower count HAS to reside within an element within the DOM with the id of follower_count so why not just use one single regular expression to target that element and return it's guts instead of all the DOM loading, callbacks and string parsing?

I might be missing something that someone could tell me but this seemed like a long way to go about a simple scrape job.


// Get the number of twitter followers

function string_getInsertedString($long_string,$short_string,$is_html=false){
if($short_string>=strlen($long_string))return false;
$insertion_length=strlen($long_string)-strlen($short_string);
for($i=0;$i<strlen($short_string);++$i){
if($long_string[$i]!=$short_string[$i])break;
}
$inserted_string=substr($long_string,$i,$insertion_length);
if($is_html && $inserted_string[$insertion_length-1]=='<'){
$inserted_string='<'.substr($inserted_string,0,$insertion_length-1);
}
return $inserted_string;
}

function DOMElement_getOuterHTML($document,$element){
$html=$document->saveHTML();
$element->parentNode->removeChild($element);
$html2=$document->saveHTML();
return string_getInsertedString($html,$html2,true);
}

function getFollowers($username){
$x = file_get_contents("http://twitter.com/".$username);
$doc = new DomDocument;
@$doc->loadHTML($x);
$ele = $doc->getElementById('follower_count');
$innerHTML=preg_replace('/^<[^>]*>(.*)<[^>]*>$/',"\\1",DOMElement_getOuterHTML($doc,$ele));
return $innerHTML;
}


// To display it

<?php echo getFollowers("username"); ?>



Here is the much shorter version I wrote. It still works just as well returning the follower count of the Twitter Account username passed into it.

function getFollowers($username){
$url = "http://twitter.com/".$username;
$count=0;

$x = file_get_contents($url);

preg_match("@<([a-z][^ ]*) id=\"follower_count\"[^>]+?>([0-9,]+)\s*</\\1>@i",$x,$match);

if($match){
$count = $match[2];
}

return $count;
}

// lets check how poorly my twitter account is followed!
echo "StrictlyTweets: " . getFollowers("StrictlyTweets") . "<br><br>";



As you can see from the regular expression I am matching HTML tags (they all start with < and then a letter) and storing that tag to be used in the backtrack reference later on so that if the HTML changes from a SPAN to a DIV as long as it has the id="follower_count" with the element there will be a match.

I could have loaded up the DOM, targeted the ID and then done some regex but why bother when you can go straight for the juggular!

Tuesday, 12 July 2011

Firefox 5.0 Firebug 1.8 Memory Leak Problem

Firefox 5.00 with Firebug 1.8.0b5 on Windows 7 64 bit causes Memory Leak

I came across a problem with Firefox 5.0 and Firebug 1.8.0b5 today that basically leaked memory like an open oil well and hung my PC every time I tried to open up a basic popup window from a page using Javascript.

My PC is running the following:

Windows 7 - Dual Core - 64 Bit
Firefox.exe *32 - version 5.0
Firebug 1.8.0b5

This problem has only started occurring since my latest upgrade and I narrowed it down to Firebug by disabling each plugin one by one.

I can replicate the problem on one particular page quite easily which has a link with a simple javascript: window.open(url) function that opens up a little static html page.

When I now click the link the popup opens full size rather than the dimensions set 400px * 400px and there is nothing on the page - just a blank screen. Viewing the source shows nothing at all.

This page used to work fine before the update.

It still works on all other browsers Chrome, Safari, IE (6 to 9) and it still works in Firefox 5.0 on 32 bit machines so I am not sure whether it's a 64 bit problem or not.

Viewing my Task Manager window I can open Firefox and I can get my Firefox to use beyond 1GB of memory and 50% CPU (the whole of one of my two processors) with only one single window and tab open all within 5 minutes.

Earlier today it was using over 3GB of memory!

Another clue is the fact that the error console is permanently full of these error messages:

attempt to run compile-and-go on a cleared scope
resource://firebug_rjs/console/errors.js
line 156

Hitting clear just fills up the whole of Firebug's console log instantly with the same message.

The only way I have found to solve this problem is to disable Firebug and not use the "Restart Firefox" link as this seems to just ramp the memory up without shutting the process down first.

I usually have to kill the process through Task Manager as clicking the close button doesn't do anything the majority of the time.

If anyone else has similar problems please let me know.


Saturday, 9 July 2011

URLEncode Problem with .NET 4.0

Visual Studio 2010 Problem with "The name HttpUtility does not exist in the current context".

I usually use Visual Studio 2010 at work and the Express edition at home. Tonight I was knocking up quick C# windows form application to crawl some specific URL's and I required the ability to URL Encode my URI Components e.g

string url = "http//somesite.com/search?qry=" +  System.Web.HttpUtility.UrlEncode(val);

However even though I had included:

using System.Web;

At the top of my class it was saying that HttpUtility could not be found in System.Web.

This was driving me crazy as I had another project open at the same time which was using the exact same code and it worked.

The advice I found on the web said to ensure that a reference to System.Web was added with the "Project" > "Add References" option.

However when I was searching for this DLL all I could find were System.WebServices and System.ApplicationServices which were obviously no good.

However after a lot of head scratching I went back to the main Project Properties panel and under Target Framework I had ".NET Framework 4 Client Profile" Selected.

I changed this first to .NET Framework 3.5 and then looked again for the reference and it worked!

I added it and then changed it back again to .NET Framework 4. On re-opening the project  all my problems had been solved.

Don't ask me why System.Web doesn't appear as a reference under the Client versions but it doesn't and you need to add a proper DLL reference to get to use this sort of functionality if you are doing non web based apps that require web based functionality like URL or HTML Encode etc.

I thought I would just write a few notes about this in-case anyone else experiences the same problem.

Tuesday, 5 July 2011

TSQL UDF to return useful dates

A User Defined Function to return useful dates

I had to come up with some calculations for working out the starting and end weekday for a given date earlier and I wrote this UDF for SQL 2000, 2005, 2008.

It returns a number of useful values including dates and strings (which is why the return value is a varchar).

If you want to know the last working day for the current month, last month or the last weekday for a month then this function will help.

You can pass in the current date e.g GETDATE() or pass in your own datetime value.


SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

=============================================================================
-- Author: Rob Reid
-- Create date: 05-JUL-2011
-- Description: Returns useful dates for calculations and formatting
/*


-- example usage

DECLARE @dte datetime
SELECT @dte = GETDATE() --OR pass in a literal date e.g '2011-Jun-03 03:54:00'

SELECT dbo.udf_GET_DATE_OF('FIRST DAY OF LAST MONTH',@dte) as 'First Day of Last Month',
dbo.udf_GET_DATE_OF('FIRST DAY OF MONTH',@dte) as 'First Day of Month',
dbo.udf_GET_DATE_OF('LAST DAY OF MONTH',@dte) as 'Last Day of Month',
dbo.udf_GET_DATE_OF('LAST DAY OF WEEK',@dte) as 'Last Day of Week',
dbo.udf_GET_DATE_OF('FIRST DAY OF MONTH',@dte) as 'First Day of Week',
dbo.udf_GET_DATE_OF('LAST WORKING DAY OF MONTH',@dte) as 'Last Working Day of Month',
dbo.udf_GET_DATE_OF('LAST WORKING DAY OF LAST MONTH',@dte) as 'Last Working Day of Last Month',
dbo.udf_GET_DATE_OF('FIRST WEEKDAY OF MONTH',@dte) as 'First Week Day of Month',
dbo.udf_GET_DATE_OF('LAST WEEKDAY OF MONTH',@dte) as 'Last Week Day of Month'



*/
=============================================================================


CREATE FUNCTION [dbo].[udf_GET_DATE_OF]
(
@rule varchar(30),
@dte datetime
)
RETURNS VARCHAR(30) AS
BEGIN

DECLARE @ret varchar(30)

IF @rule = 'FIRST DAY OF LAST MONTH'
BEGIN
SELECT @ret = '01/' + UPPER(LEFT(DATENAME(MONTH,DATEADD(MONTH,-1,@dte)),3)) + '/' + CAST(YEAR( DATEADD(MONTH,-1,@dte) ) as varchar(4))
END
ELSE IF @rule = 'FIRST DAY OF WEEK'
BEGIN
SELECT @ret = DATEADD(dd,-(DATEPART(dw, @dte) - 1),@dte)
END
ELSE IF @rule = 'LAST DAY OF WEEK'
BEGIN
SELECT @ret = DATEADD(dd,-(DATEPART(dw, @dte) - 7),@dte)
END
ELSE IF @rule = 'FIRST DAY OF MONTH'
BEGIN
SELECT @ret = DATEADD(dd,-(DAY(@dte)-1),@dte)
END
ELSE IF @rule = 'LAST DAY OF MONTH'
BEGIN
SELECT @ret = DATEADD(d, -DAY(DATEADD(m,1,@dte)),DATEADD(m,1,@dte))
END
ELSE IF @rule = 'FIRST WEEKDAY OF MONTH'
BEGIN
SELECT @ret = DATENAME(dw, DATEADD(dd, - DATEPART(dd, @dte) + 1, @dte))
END
ELSE IF @rule = 'LAST WEEKDAY OF MONTH'
BEGIN
SELECT @dte = DATEADD(dd,-(DAY(@dte)-1),DATEADD(MONTH,1,@dte)),
@ret = DATENAME(dw,CONVERT(VARCHAR, DATEADD(DAY, 0 - ((DATEPART(DAY, @dte)) +
CASE WHEN DATENAME(WEEKDAY, DATEADD(DAY, 0 - (DATEPART(DAY, @dte)), @dte)) = 'SUNDAY' THEN 2
WHEN DATENAME(WEEKDAY, DATEADD(DAY, 0 - (DATEPART(DAY, @dte)), @dte)) = 'SATURDAY' THEN 1
ELSE 0 END
), @dte), 113))

END
ELSE IF @rule = 'LAST WORKING DAY OF LAST MONTH'
BEGIN
SELECT @ret = CONVERT(VARCHAR, DATEADD(DAY, 0 - ((DATEPART(DAY, @dte)) +
CASE WHEN DATENAME(WEEKDAY, DATEADD(DAY, 0 - (DATEPART(DAY, @dte)), @dte)) = 'SUNDAY' THEN 2
WHEN DATENAME(WEEKDAY, DATEADD(DAY, 0 - (DATEPART(DAY, @dte)), @dte)) = 'SATURDAY' THEN 1
ELSE 0 END
), @dte), 113)
END
ELSE IF @rule = 'LAST WORKING DAY OF MONTH'
BEGIN
SELECT @dte = DATEADD(dd,-(DAY(@dte)-1),DATEADD(MONTH,1,@dte)),
@ret = CONVERT(VARCHAR, DATEADD(DAY, 0 - ((DATEPART(DAY, @dte)) +
CASE WHEN DATENAME(WEEKDAY, DATEADD(DAY, 0 - (DATEPART(DAY, @dte)), @dte)) = 'SUNDAY' THEN 2
WHEN DATENAME(WEEKDAY, DATEADD(DAY, 0 - (DATEPART(DAY, @dte)), @dte)) = 'SATURDAY' THEN 1
ELSE 0 END
), @dte), 113)
END

RETURN @ret


END


The example usage is given in the UDF definition e.g

DECLARE @dte datetime
SELECT @dte = GETDATE() --OR pass in a literal date e.g '2011-Jun-03 03:54:00'

SELECT dbo.udf_GET_DATE_OF('FIRST DAY OF LAST MONTH',@dte) as 'First Day of Last Month',
dbo.udf_GET_DATE_OF('FIRST DAY OF MONTH',@dte) as 'First Day of Month',
dbo.udf_GET_DATE_OF('LAST DAY OF MONTH',@dte) as 'Last Day of Month',
dbo.udf_GET_DATE_OF('LAST DAY OF WEEK',@dte) as 'Last Day of Week',
dbo.udf_GET_DATE_OF('FIRST DAY OF MONTH',@dte) as 'First Day of Week',
dbo.udf_GET_DATE_OF('LAST WORKING DAY OF MONTH',@dte) as 'Last Working Day of Month',
dbo.udf_GET_DATE_OF('LAST WORKING DAY OF LAST MONTH',@dte) as 'Last Working Day of Last Month',
dbo.udf_GET_DATE_OF('FIRST WEEKDAY OF MONTH',@dte) as 'First Week Day of Month',
dbo.udf_GET_DATE_OF('LAST WEEKDAY OF MONTH',@dte) as 'Last Week Day of Month'



I have found this very useful lately when calculating certain statistical reports and maybe some of you will as well.