Most languages have a Trim function that removes white space from either side of a string.
However a lot of the time these Trim functions will only remove standard white space e.g if you hit the space bar a couple of times and not other forms of white space such as tabs, new lines or HTML space characters such as non breaking spaces whether they are HTML entity encoded: or Numerically encoded e.g
Therefore sometimes you may need a "Super Trim" function that will handle the removal of all types of space characters including HTML entities.
The following have been written in PHP but can easily be converted into C# or VB. The main part to take away is the regular expression used within each function which replaces a string containing one or more space characters whether they be control characters or HTML entities from either side of the string.
// wrapper function to do trim both sides
function HTMLTrim($text){
// call both functions at once
return HTMLLeftTrim(HTMLRightTrim($text));
}
// removes spaces and at the beginning of strings
function HTMLLeftTrim($text){
// remove space to the left of the text
return preg_replace("@^( | |\s)+(\S+)@","$2",$text);
}
// removes spaces and at the beginning of strings
function HTMLRightTrim($text){
// remove space to the right of the text
return preg_replace("@(\S+)( | |\s)+$@","$1",$text);
}
You can test this out in a simple PHP page with the following code:
$str = " hello there     ";
echo "before trim its '" . $str . "'";
echo "<br><br>now its '" . HTMLTrim($str) . "'";
Which returns the following output:
before trim its ' hello there '
now its 'hello there'
I find it very useful when I am scraping content from the web and need to handle the removal of a mixture of standard spaces and HTML spaces.
No comments:
Post a Comment