Showing posts with label c#. Show all posts
Showing posts with label c#. Show all posts

Thursday, 11 August 2022

Testing For Internet Connectivity

Setup Measures For A Big API System

By Strictly-Software

I have a BOT that runs nightly, and in the SetUp() methods to ensure everything is okay, it runs a number of tests before logging to an API table in my database. This is so I know if the API I am using is available, whether I needed to login to it, when the last job was started and stopped, and when the last time I did log into the API. 

It just helps in the backend to see if there are any issues with the API without debugging it. Therefore the SetUp job has to be quite comprehensive.

The things I need to find out, which are logged into the logfile.log I keep building until the job is finished with major Method Input Params, Return Params, Handled Errors, Changed System Behaviour, SQL Statements that return an error e.g timeout, an unexpected divide by zero error and the like are the following from my SetUp job.

1. Whether I can access the Internet, this is done with a test to a page that just returns an IPv4 or IPv6 address. The major HTTP Status code I look for there is a 200 status code. If I get nowhere but get a status code of 1 that is a WebExceptionStatus.NameResolutionFailure, which means the DNS could not find the location and it's probably due to either your WIFI button being turned off (Flight Mode), or your Network Adapter having issues.

I test for web access with an obvious simple HTTP test to an IP page that returns my address, with 2 simple regular expressions that can ID if it's IPv4 or IPv6.


public string TestIPAddress(bool ipv4=true)
{
    string IPAddress = "";
    // if we are using an IPv6 address the normal page will return it if we want to force
    // get an IPV4 address then we go to the IPv4 page which is the default behaviour due to
    // VPNS and Proxies using IPV4 usually
    string remotePageURL = ((ipv4) ? "https://ipv4.icanhazip.com/" : "https://icanhazip.com/");
   
    this.UserAgent = this.BrainiacUserAgent; // use our BOT UserAgent as no need to spoof
    this.Referer = "LOCALHOST"; // use our localhost as we are running from our PC
    this.StoreResponse = true; // store the reponse from the page
    this.Timeout = (30000); // ms 30 seconds            

    // a simple BOT that just does a GET request I am sure you can find a C# BOT example on my blog or tons on the web
    this.MakeHTTPRequest(remotePageURL);            

    // if status code is between 200 & 300 we should have data
    if (this.StatusCode >= 200 && this.StatusCode < 300)
    {
		IPAddress = this.ResponseContent; // get the IP Address

		// quick test for IPAddress Match, IPv4 have 3 dots
		if(IPAddress.CountChars('.')==3)
		{
		   // An IpV4 address can replace dots and if all numbers then it is IPv4
		   string ip2 = IPAddress.Replace(".", "");
		   if(Regex.IsMatch(ip2, @"^\d+$"))
		   {
				this.HelperLib.LogMsg("IP Address is IPv4: " + IPAddress + ";","MEDIUM"); // log to log file
		   }
		}
		// otherwise if it contains : semi colons (and nos and certain HEX letters)
		else if (IPAddress.Contains(':'))
		{
		   // could be an IpV6 address check only numbers and letters when colons are removed                    
		   if (Regex.IsMatch(IPAddress, @"^[:0-9A-F]+$"))
		   {
				this.HelperLib.LogMsg("IP Address is IPv6: " + IPAddress + ";","MEDIUM"); // log to log file
		   }
		}
    }
    else
    {
		// no response, flight mode button enabled?
		this.HelperLib.LogMsg("We could not access the Internet URL: " + remotePageURL + "; Maybe Flight Mode is on or Adapter is broken and needs a refresh", "LOW");

		IPAddress = "No IP Address returned; HTTP Errror: " + this.StatusCode.ToString() + "; " + this.StatusDesc + ";";

		// call my IsNetworkOn function to test we have a network
		if(this.IsNetworkOn())
		{
		   this.LastErrorMessage = "The Internet Network Is On but we cannot access the Internet";
		}
		else
		{
		   this.LastErrorMessage = "The Internet Network Is Off and we cannot access the Internet";
		}

		this.HelperLib.LogMsg(this.LastErrorMessage,"LOW"); // log error

		// throw the error will be caught in a lower method and stop the script
		throw new System.Exception(this.LastErrorMessage);
    }
}


If that fails then I run this method to see if the Network is available. You can see in the code above that I call it if there is no 200-300 HTTP status response or IP address returned. 

I test it by running the console script with and without the flight mode button on my laptop.

public bool IsNetworkOn()
{
bool IsNetworkOn = System.Net.NetworkInformation;

NetworkInterface.GetIsNetworkAvailable();

return IsNetworkOn;
}
I also have a series of Proxy addresses I can use to get round blocks on IP addresses although I do try and use Karmic Scraping e.g no hammering of the site, caching pages I need to prevent re-getting them, gaps between retries if there is a temporary error where I change the referer, user-agent and proxy (if required), before trying again after so many seconds.

Also before I turn on a global setting in my HelperLib class which is the main class all other objects refer to and pass from one to another, I test to see if the computer I am on is using a global proxy or not. I do an HTTP request to a page that returns ISP info, and I have a list of ISP's which I can check

If my ISP comes back with the right ISP for my home, I know I am not on a global proxy, also it tells me whether the IP address from earlier is IPv4 or IPv6, then I know I am not using a global proxy. If Sky or Virgin is returned I know I have my VPN enabled on my laptop and that I am using a global proxy.

If I am using a global proxy all the tests and checks for "getting a new random proxy:port" when doing an HTTP request are canceled as I don't need to go through a VPN and then a proxy. If the Global Proxy is turned off I might choose to get a random proxy before each HTTP call. 

If that setting is enabled in my Settings class. As you can't have config page in console scripts that hook into a DLL, my code in the DLL for my settings is just a class called Settings with a big Switch Statement where I get the setting I need or set it.

Once I have checked my Internet access I then do an HTTP call to the Betfair API Operational page which tells me if the API is working or not. If it isn't then I cannot log in into it. If it is working I do a Login test to ensure I can login this involves:
  • Using 1 of 2 JSON methods to login to the 2 endpoints that give me API access with a hashed API code, and my username & password.
  • If I can log in, I update the API table in the DB with the time I logged in and that I am logged in. There may be times I don't need to login to the API such as when I am just scraping all the Races and Runner info.
  • I then do a Session re-get, I hold my session in a file so I don't have to re-get a new one each time but on a run I like to get a new Session and extract the session code from the JSON return to pass back in JSON calls to get data etc.
  • I then do a database test using my default connection string I just SELECT TOP(1) FROM SYS.OBJECTS and ensure I get a response.

Once I know the API is operational, the DB connectivity is working and that I can scrape pages outside the Betfair API I can set all the properties such as whether to use a random proxy/referer/user-agent if one of my calls errors (with a 5 second wait), if I am using a global proxy I don't bother with the proxies.

Believe it or not that is a lot of code across 5 classes just to ensure everything is okay on running the job from the console or Windows Service

I throw my own errors if I cannot HTTP scrape, not just log them, so I can see what needs fixing quickly, and I create a little report with the Time, BOT version, Whether it is live, testing, placing real bets, running from a service or console and other important info I might need to read quickly.

So this is just a quick article on how I go about ensuring a project using a big DLL (with all the HTTP, HelperLib, API etc) code is working when I run it.

Who knows you might want to do similar checks on your large automated systems. 

Who knows. I am just making bank from my AutoBOT which TRADES on Betfair for me!

© 2022 Strictly-Software

Wednesday, 3 August 2022

Extending Try/Catch To Handle Custom Exceptions - Part 2

Extending Core C# Object Try/Catch To Display Custom Regular Expression Exceptions - Part 2

By Strictly-Software

This is the second part of our custom Try/Catch overwrite program that is aimed to catch Regular Expression errors that would normally be thrown up by !IsMatch (doesn't match) and so on.
 
So why would you want to do this when the C# .NET RegEx functions already return a true or false or 0 matches etc which you could use for error detection?
 
Well, you can read the first article above this one for info however you may have a case where you have a BOT that is automatically running every day all day on certain sites where the HTML has been worked out and expression especially made to match them. 

If for example the HTML source changes and the piece of data you are collecting suddenly does not get returned anymore as your expression does not match anything instead of just logging to a file which means scanning and checking for errors once you notice your fully automated 24/7/365 system has stopped working 2 months later. 

Therefore instead of just logging the results of IsMatch you may want to throw up an error and stop the program so that a new expression can be made to match the HTML ASAP without lots of data being missed.
 
If you want more info on the Try/Catch class, read the first example article. Here we build just a basic BOT that is going to utilise the RegExException class we made last time. It is not a perfect BOT, you may get some warnings come up, I did it in VS 2021, and it's a specific BOT as its aims are to get the META Title, Description, and the full source of the URL that is passed into it. 

Any errors with the regular expressions not matching anymore are thrown up using our RegExException TryCatch cobject so that they detail the expression, text, or whichever of the 3 methods we made. If you want to use it as the basis of your own BOT you can but has been specifically designed to demonstrate this RegEx TryCatch example so you may want to edit or remove a lot of content first before building back on top of it if you want to make your own BOT in C#.
 
So here is the BOT

public class HTML
{
	public string URL  // property
	{ get; set; }

	public string HTMLSource  
	{ get; set; }

	public string HTMLTitle
	{ get; set; }

	public string HTMLDesc
	{ get; set; }

	public void GetHTML()
	{
	    string HTMLSource = "";
	    string HTMLTitle = "";
	    string HTMLDesc = "";

	    // if no URL then all properties are blank
	    if (String.IsNullOrWhiteSpace(this.URL))
	    {
			HTMLTitle=HTMLDesc = HTMLSource = "";
			return;
	    }

	    try
	    {
			HttpWebRequest request = (HttpWebRequest)WebRequest.Create(this.URL);
			HttpWebResponse response = (HttpWebResponse)request.GetResponse();

			if(response.StatusCode == HttpStatusCode.Forbidden)
			{
		    	throw new Exception(String.Format("We are not allowed to visit this URL {0} 403 - Forbidden", this.URL));
			}
			else if(response.StatusCode == HttpStatusCode.NotFound)
			{
		    	throw new Exception(String.Format("This URL cannot be found {0} 404 - Not Found", this.URL));
			}
            // 200 = OK, we have a response to analyse
            else if (response.StatusCode == HttpStatusCode.OK)
            {
                Stream receiveStream = response.GetResponseStream();
                StreamReader readStream = null;
                if (response.CharacterSet == null)
                {
                    readStream = new StreamReader(receiveStream);
                }
                else
                {
                    readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
                }

                string source = this.HTMLSource = readStream.ReadToEnd();

                if (!String.IsNullOrWhiteSpace(source))
                {
                    // extract title with a simple regex                        
                    string reTitle = @"^[\S\s]+?<title>([\S\s]+?)</title>[\S\s]+$";
                    // Get the match for META TITLE if we can using a wrapper method that takes a regex string, source string and group to return
                    string retval = this.GetMatch(reTitle, source, 1);
                    if (!String.IsNullOrWhiteSpace(retval))
                    {
                        this.HTMLTitle = retval;
                    }
                    else // failed to find the <title>Page Title</title> in the HTML source so throw a Regex Exception
                    {
                        throw new RegExException(reTitle, new Exception(String.Format("No match could be found for META Title with {0}", reTitle)));
                    }

                    // META DESCRIPTION
                    string reDesc = @"^[\s\S]+?<meta content=[""']([^>]+?)['""]\s*?name='description'\s*?\/>[\s\S]*$";
                    // Get the match for META DESC if we can using a wrapper method that takes a regex string, source string and group to return
                    retval = this.GetMatch(reDesc, source, 1);
                    if(!String.IsNullOrWhiteSpace(retval))
                    {
                        this.HTMLDesc = retval;
                    }
                    else // failed to find the <title>Page Title</title> in the HTML source so throw a Regex Exception
                    {
                      throw new RegExException(reDesc, new Exception(String.Format("No match could be found for META Description with {0}", reDesc)));
                    }
                }

                response.Close();
                readStream.Close();
            }
	    }
	    catch(WebException ex)
	    {
          	// handle 2 possible errors
          	if(ex.Status==WebExceptionStatus.ProtocolError)
          	{
              Console.WriteLine("Could not access {0} due to a protocol error", this.URL);
          	}
          	else if(ex.Status==WebExceptionStatus.NameResolutionFailure)
          	{
              Console.WriteLine("Could not access {0} due to a bad domain error", this.URL);
          	}
	    } // catch anything we haven't handled and throw so we can code in a way to handle it
	    catch (Exception ex)
	    {
			throw;
	    }
	}


	private string GetMatch(string re,string source,int group)
	{
	    string ret = "";

	    // sanity test
	    if (String.IsNullOrWhiteSpace(re) || String.IsNullOrWhiteSpace(source) || group == 0)
	    {
			return "";
	    }
	    // use common flags these could be passed in to method if need be
	    Regex regex = new Regex(re, RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);

	    if (regex.IsMatch(source))
	    {
          	MatchCollection matches = regex.Matches(source);

          	Console.WriteLine(String.Format("We have {0} matches", matches.Count.ToString()));

		foreach (Match r in matches)
		{
		    if (r.Groups[group].Success)
		    {
				ret = r.Groups[group].Value.ToString().Trim();
				break;
		    }
		}                
	    }
	    // return matched value
	    return ret;
	}
}
 
So as you can see this HTML Class is my basic BOT. It has 4 properties, URL (which is just to set the URL of the code to get) then: HTMLSource, HTMLTitle, and HTMLDesc which as you can see are all abbreviations of the 3 main bits of the content we want from our BOT, the META Title, META Description, and the whole HTML source. We set these using the code in our HTML classes GetHTML() which does the main work but uses another method GetMatch() to try and get the match of the parameters passed in e.g the expression, the source code to get any match from, and the no of the match to get.
 
For example, there may be bad XHTML/HTML all over the web where people have used ID"s multiple times or put elements that should only exist once in multiple times. With this last parameter, you can ensure you are getting the match you want, usually 1 for the first.
 
Hopefully, you can see what the class is doing and how the results of the IsMatch method can fire off a custom RegExException or not and how it is formatted.
 
In the next article, we will show the content that uses this class to return data and put it all together
 
 
© 2022 By Strictly-Software 

Thursday, 26 May 2022

Extending Try/Catch To Handle Custom Exceptions - Part 1

Extending Core C# Object Try/Catch To Display Custom Regular Expression Exceptions - Part 1

By Strictly-Software

Have you ever wanted to throw custom exception errors whether for business logic such as if a customer isn't found in your database, or whether more complex logic has failed that you don't want to handle but raise so that you know about it.
 
For me, I required this solution due to a BOT I use that collects information from various sites every day and uses regular expressions to find the pieces of data I require and extract them. The problem is that the site often changes its HTML source code and it will break a regular expression I have specifically crafted to extract the data using it.
 
For example, I could have a very simple regular expression that is getting the ordinal listing of a name using the following C# regular expression:
 
string regex = @"<span class=""ordinal_1"">(\d+?)</span>";
Then one day I will run the BOT and it won't return any ordinals for my values and when I look inside the log file I find it will have my own custom message due to finding no match such as "Unable to extract value from HTML source" and when I go and check the source it's because they have changed the HTML to something like this:
<span class="ordinal_1__swrf" class="o1 klmn">"1"<sup>st</sup></span> 
 
This is obviously gibberish many designers add into HTML to stop BOTS crawling their data hoping that the BOT developer has written a very specific regex that will break and return no data when met with such guff HTML. 
 
Obviously, the first expression whilst meeting the original HTML alright is too tight to handle extra guff within the source. 
 
Therefore it required a change in the expression to something a bit more flexible in case they added even more guff into the HTML:
 
string regex = @"<span class=""ordinal_+?[\s\S]+?"">""?(\d+?)""?<sup";
 
As you can see I have made the expression looser, with extra question marks ? in case quotes are or are not wrapped around values, and using non-greedy match any character and non-character expressions like [\s\S]+? to handle the gibberish from the point it appears to where I know it has to end at the closing quote or bracket.
 
So instead of just logging the fact I have missing data from crawls I wanted to raise the errors with TRY/CATCH and make the fact that a specific piece of HTML no longer matches my expression an exception that will get raised so I can see it as soon as it happens. 
 
Well with C# you can extend the base object TRY/CATCH so that your own exceptions based upon your own logic can be used. In future articles, we will build up to a full C# Project with a couple of classes, a simple BOT and some regular expressions we can use to test what happens when trying to extract common values from the HTML source on various URL's to throw exceptions.
 

Creating The TRY CATCH CLASS

First off the TRY / CATCH C# Class where I am extending the base object to call, usual messages but using String,Format so that I can pass in specific messages. I have put numbers at the start of each method so that when running the code later we can see which exceptions get called.
 
I have just created a new solution in Visual Studio 2022 called TRYCATCH and then named my RegExException that extends the Exception object. You can call your solution what you want but for a test, I don't see why just following what I am doing is not okay.

[Serializable]
public class RegExException : Exception
{
public RegExException() { }

public RegExException(string regex, string msg)
   : base(String.Format("1 Regular Expression Exception: Regular Expression: {0}; {1}", regex, msg )) { }

public RegExException(string msg)
    : base(String.Format("2 {0}", msg)) { }

public RegExException(string regex, Exception ex)
    : base(String.Format("3 Regular Exception is no longer working: {0} {1}", regex,ex.Message.ToString())) { }

}
 
As you can see the Class has 3 overloaded methods which either just take a single message, a regular expression string and a message, and a regular expression and an exception these values are placed in specific places within the exception messages.
 
You would cause a Regular Expression Exception to be thrown by placing something like this in your code:
 
throw new RegExException(String.Format("No match could be found with Regex {0} on supplied value '{1}'", re, val));
 
Hopefully, you can see what I am getting at, and we will build on this in a future post.
 
© 2022 By Strictly-Software 
 
 

Wednesday, 21 July 2021

Making A Super Trim Function

Using Regular Expressions To Make a SuperTrim() Function


By Strictly-Software

How many times has there been when you have two bits of text that you have extracted from various websites, or feeds or even databases and tried to compare them but they would not match?

I know I wrote a little example of when two different ASCII space characters are used within SQL the other day and how to check and remove them to make a match but what about all the various ways you can HTML Encode spaces like &nbsp; &#32; and &#x20; plus others that make up a CrLF or just a Cr or Lf, a bit like the VbCrLf constant for a carriage return and line feed, either using Environment.NewLine or constants that hold values for \r \n and also maybe a tab \t.

All these are spaces that need removing and with a special function that uses regular expressions they can all easily be removed .

I use this function in MS SQL with a CLR C# UDF as well as Extending C# projects with a new SuperTrim() method like so:

public static string SuperTrim(this string value)
{
    string newval = "";

    // match each type of space from start of input up to a word character that may or may not have spaces in between
    // e.g a sentence like Hello There John and then removes the same space characters to the right to the end of sentence.
    string re = @"(^(?:&nbsp;|&#32;|&#x20;|\s|\t|\r|\n)+?)(\w+[\s\S]+?\w+)((?:&nbsp;|&#32;|&#x20;|\s|\t|\r|\n)+?$)";
    Regex regex = new Regex(re, RegexOptions.Compiled | RegexOptions.IgnoreCase);

    newval = regex.Replace(value, ""); // replace each space HTML char with nothing

    return newval;
}


It is pretty easy enough to create yourself a test page in HTML using JavaScript with a couple of textarea input boxes for the test value containing encoded spaces a button to run a JS function that runs the regex as seen in the C# example and then outputs the result in another box. 

The regular expression is interchangeable between languages, that's what I love about Regular Expressions, they can be tested and played about with on a simple HTML page with JavaScript and then one the expression works you can easily move it into whatever language you are working in e.g C# or PHP.

For example this encoded text:

&nbsp;  &#x20;&#x20; Rob Reid &#x20;&#x20;&#x20;   

Then after running the regular expression or string newValue = EncodedValue.SuperTrim() method in C# or JavaScript you should get this value with no encoded characters left.
Rob Reid

I find extending whatever language I am writing in to include a SuperTrim() function very handy. If you were handling URL's you might want to remove %20 and the + sign, you can always add more or less into the expression depending on your needs of course like values for nulls or \v for vertical tabs depending on the content you are handling.


By Strictly-Software

© 2021 Strictly-Software

Thursday, 9 January 2014

Download and Decompress a GZIP file with C Sharp

Using C# to download a remote GZIP file and then decompress it

By Strictly-Software

Recently I had the task of writing a program in C# that would obtain a list of proxies from various sources and then run code to ensure that they were working so I could then use the useful ones.

In this project I had a window that showed the proxy IP address, Country it came from and whether it was Anonymous, High Anonymous, Transparent etc.

I also had a check button which once the proxy list had loaded would run a test against each IP address and Port No to see the time that it took to do the following:
  • Ping the proxy if possible e.g 408 ms
  • HTTP ping the proxy by using the details to request one of a randomly selected number of fast loading pages e.g www.google.com, www.bing.com etc.
Personally I think the HTTP ping is more important when dealing with proxies than a normal PING.

A simple ping to an IP address could respond very quickly or not at all but when you are using Proxies in computing to request HTML pages you want to know how fast it takes to return such a page.

Anyway the whole point of the exercise was that I needed to have a list of countries that I could check the IP addresses against.

Luckily the great site http://geolite.maxmind.com have a free GeoIP.dat.gz file that you can download and use that is pretty accurate (but not as accurate as the paid for version). However the free version was good enough for what I needed.

The issue was that the .dat file came as a GZipped file and once I had downloaded it I needed to decompress it. This wasn't the normal .zip decompress but in .NET 4.5 it is pretty easy to accomplish.

I have shown you a basic example of the class at the bottom of the page but the most important function is the method which does the Gzip decompression.


/// 
/// Decompress a gzipped file to compress we can just use the CompressionMode.Compress parameter instead
/// 
/// 
public static void Decompress(FileInfo fileToDecompress)
{
    using (FileStream originalFileStream = fileToDecompress.OpenRead())
    {
 string currentFileName = fileToDecompress.FullName;
 string newFileName = currentFileName.Remove(currentFileName.Length - fileToDecompress.Extension.Length);

 using (FileStream decompressedFileStream = File.Create(newFileName))
 {
     using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
     {
  decompressionStream.CopyTo(decompressedFileStream);                        
     }
 }
    }
}
The three libraries you will need to accomplish all this apart from anything else you intend to do will be:

using System.Net
using System.IO;
using System.IO.Compression;

System.Net is required for the WebClient class to do it's work downloading the remote file to our computer and the System.IO one is required for checking that files and folders exist.

The last one is the most important, System.IO.Compression as it's the library that lets us decompress the file.

You might have to add this in as a reference in Visual Studio. Just go to: Project > Add Reference > Framework > and tick the box next to System.IO.Compression.

Also note that I am using .NET 4.5 on a Windows 7 64 bit machine. In Windows7 for security sake (I presume) most applications that need to write and read to a file, or download and hold data of some sort is done in the new C:\ProgramData folder.

You will notice that this directory is full of well known names like Microsoft, Skype, Sun, Apple and any many other software producers that needs somewhere to log data in a safe place.

In the old days people could just write programs that saved files all over the place which obviously wasn't safe. Especially if you were the admin of the computer and hit a button on a program that you thought was going to do one thing but was actually adding or deleting files all over your computer's hard drive.

Anyway the whole code is below. Make of it what you will but it's pretty simple and I found it very useful.


using System;
using System.Linq;
using System.Text;
// we need this to download the file from the web
using System.Net
// these are the two we need to do our decompression job
using System.IO;
using System.IO.Compression;

// this will hold any error message incase we get one and need to return it to the calling program
private string ErrorMessage = "";

/// 
/// Ensure our special folder in C:\ProgramData\ exists e.g C:\ProgramData\MyProgram
/// Then check the file we need to get countries related to IP's exists from http://geolite.maxmind.com and if it doesnt' download it
/// and copy it to this folder. Then we need to decompress it as its a gzip file e.g GeoIP.dat.gz so we need the uncompressed GeoIP.dat file to work with
/// 
public SetUp()
{
    // ensure a folder we can write to exists - named after my program
    dataFolder = Environment.GetFolderPath(Environment.SpecialFolder.CommonApplicationData) + @"\MyProgramName";

    if (!Directory.Exists(dataFolder))
    {
 try
 {
     // The folder doesn't exist so try and create it now
     Directory.CreateDirectory(dataFolder);

 }
 catch (Exception ex)
 {
     // set a global error message we can return to the calling object
     this.ErrorMessage = "The data folder could not be created: " + ex.Message.ToString();

     return;
 }
    }

    // we have a folder but do we have an uncompressed .dat file?
   
    // set up the paths
    // first the path of the uncompressed .dat file in case we already have it
    string geoCityDataPath = dataFolder + @"\" + this.GeoLiteCityDataFile;

    // then the path of the .dat.gz compressed file in case we need to uncompress
    string zipFilePath = dataFolder + @"\" + this.ZippedGeoLiteCityDataFile;

    // check for an uncompressed data file
    if (!File.Exists(geoCityDataPath))
    {
 try
 {

     // we don't have a file so download it from the website and copy it to our folder
     // we could schedule this behaviour to get the latest file by checking the dates or just doing a download once a week/month
     WebClient webClient = new WebClient();
     webClient.DownloadFile("http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz", zipFilePath);

     // now we have our file create a FileInfo object from it to pass to our gzip decompress method
     FileInfo gzFileInfo = new FileInfo(zipFilePath);

     // Call the method to decompress the gzip file
     this.Decompress(gzFileInfo);
 }
 catch (Exception ex)
 {
     // set a global error message we can return to the calling object
     this.ErrorMessage = "The GeoIP.dat.gz file could not be downloaded or decompressed: " + ex.Message.ToString();

     return;
 }
    }
}


/// 
/// Decompress a gzipped file to compress we can just use the CompressionMode.Compress parameter instead
/// 
/// 
public static void Decompress(FileInfo fileToDecompress)
{
    using (FileStream originalFileStream = fileToDecompress.OpenRead())
    {
 string currentFileName = fileToDecompress.FullName;
 string newFileName = currentFileName.Remove(currentFileName.Length - fileToDecompress.Extension.Length);

 using (FileStream decompressedFileStream = File.Create(newFileName))
 {
     using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
     {
  decompressionStream.CopyTo(decompressedFileStream);                        
     }
 }
    }
}

Tuesday, 14 May 2013

Handling unassigned local variable errors with struct objects in C#

Handling non assigned struct objects in C#

If you have ever used structs and had use of unassigned local variable errors from your editor i.e Visual Studio then there is a simple solution.

The problem comes about because the compiler is not clever enough to realise that the struct object will always be initialised when used.

This is usually because the struct object is initialised within an IF statement or other code branch which makes the compiler believe that a similar situation to the "unreachable code" error has been detected.

As the compile cannot definitely tell that the struct object will always be initialised when it gets used it will raise a compile error.

In Visual Studio it will usually show up with a red line under the code in question with the error message "use of unassigned local variable ..."

Here is a simple example where the struct object is populated with a method and starts off in the main constructor method unassigned.

However because of the nature of the code and the fact that on the first loop iteration oldID will never be the same as currentID (as oldID starts off as 0 and currentID as 1) then the IF statement will always cause the this.FillObject method to run on each iteration.

Therefore the myvar variable which is based on a struct called myStructObj will always get populated with new values from the loop.

However the compiler cannot tell this from the code and will raise the "use of unassigned local variable myvar" error when I try to pass the object as a parameter into the this.OutputObject(myvar) method which just outputs the current property values from the object.
public class Test
{

 /* example of a method that believes the struct object won't get assigned even though due to the if statement it always will */
 public void Test()
 {

  myStructObj myvar;
  int oldID = 0; 

  /* just a basic loop from 1 to 9 */
  for(int currentID = 1; currentID < 10; currentID++)
  {
   /* as the oldID starts as 0 and currentID starts as 1 on the first loop iteration we will always populate the struct object with values */
   if(oldID != currentID)
   {
    /* populate our struct object using our FillObject method */
    myvar = this.FillObject(currentID, "ID: " + currentID.ToString());

    oldID = currentID;
   }

   /* try and parse our struct to a method to output the values - this is where we would get our red line under the myvar parameter being passed into the OutputObject method e.g. "use of unassigned local variable myvar" */
   this.OutputObject(myvar);
  }

 }

 /* Simple method to output the properties of the object to the console */
 private void OutputObject(myStructObj myvar)
 {
  Console.WriteLine(myvar.prop1);
  Console.WriteLine(myvar.prop2);
 }

 /* Simple method to populate the struct object with a string and integer value for both properties*/
 private myStructObj FillObject(string val1, int val2)
 {
  myStructObj myvar = new myStructObj();

  myvar.prop1 = val1;
  myvar.prop2 = val2;

  return myvar;
 }

 /* my struct object definition - using non nullable types */
 public struct myStructObj
 {
  public string prop1;

  public int prop2;
 }
}

Solution to use of unassigned local struct variable

The solution is to either to always initialise the object before you start the loop or to just use the default keyword to ensure your struct object variable is always set-up with default values.

Example Fix

myStructObj myvar = default(myStructObj);

This will get rid of those annoying red lines and use of unassigned local variable errors.

If your struct object is a value type then it calls the default constructor and if it's a reference type you will get a null that you can then test for before using it.

Simples!

Monday, 24 September 2012

Using String Builders to speed up string concatenation

Using String Builders to speed up string concatenation

If you are using a modern proper language then a string builder object like the one in C# is a standard tool for adding strings without the overhead of concatenation which can be a performance killer.

The reason is simple.

When you do this

a = "hello I am Rob";

a = a + " and I would like to say thank you";

a = a + " and good night";


A lot of languages have to make a copy of the string built so far and store it in memory before creating the new string.

This means that the longer the string gets the more memory is used up as two copies have to be held at the same time before being joined together.

I have actually seen ASP classic sites crash with out of memory errors caused by people using string concatenation to build up large RSS feeds.

The reason I am mentioning this is because of a comment I was given about my popular HTML Encoder object that handles double encoding, numerical and entity encoding and decoding with partial and fully encoded strings.

I have updated the numEncode function after the comment from Alex Oss to use a simple string builder which in JavaScript is very simple.

You just have an empty array, push the new strings into it (at the end of the array) and then join it together at the end to get the full string out. You can see the new function below.


// Numerically encodes all unicode characters
numEncode : function(s){ 
 if(this.isEmpty(s)) return ""; 

 var a = [],
  l = s.length; 
 
 for (var i=0,len=l.length;i "~"){ 
   a.push("&#"); 
   a.push(c.charCodeAt()); //numeric value of code point 
   a.push(";"); 
  }else{ 
   a.push(c); 
  } 
 } 
 
 return a.join("");  
}, 

You can download the latest version of my HTML Encoder Script for JavaScript here.

However in old languages like ASP classic you are stuck with either string concatenation or making your own string builder class.

I have made one which can be downloaded from my main website ASP String Builder Class.

You will notice that it ReDim's the array in chunks of 128 (which can be changed) and once 128 elements have been used it then ReDim's by another large chunk.

A counter is kept so we know how many element we actually have added and once we want to return the whole string we can either just RTRIM it (if we are joining with a blank space) or ReDim it back down to the right array size before joining it together.

This is just an example of how a string builder class is used and you could make a similar one in JavaScript that lets you access specific elements, the previous or next slot, update specific slots and set the delimiter like this ASP version.

Most modern languages have a String Builder Class but if you are using old languages or scripting languages like PHP or ASP classic then adding strings to an array before joining them together is the way to go for performances sake.

Saturday, 2 June 2012

C# Betfair API Code for identifying WIN only markets

Identifying WIN only markets on the BETFAIR API

Updated 02-JUN-2012 

As it's Derby day again I ran into the same problem as last year with this article except that the wrong market identified as WIN ONLY was the FAV SP market. One in which you bet on the favourites starting price. Therefore I have updated the BOT code for identifying a market from the Betfair API from HorseName, Racedatetime and Course alone.


If you don't know I developed the www.fromthestables.com website which allows members to access UK horse trainer information everyday about their runners.

As a side line I have also developed my own AutoBOT which uses the BETFAIR Free API to place bets automatically using my own ranking system which I have developed. You can follow my tips on Twitter at @HorseRaceInfo.

One of the problems I have come across during the development of my AutoBOT is that if you have the name of the Horse, Course and time of the race and want to find the Market ID that Betfair uses to identify each race there is a costly mistake that can occur due to all the various markets that are available.

I really got hooked on racing (not betting but actually watching horse racing) when I had a bet on Workforce in the 2010 Derby.

It came from the back of the field to storm past everyone else and won the Derby in record course time and in astonishing style.

Watching him apply the same tactics in the Prix de l'Arc de Triomphe to become the champion of Europe that same year installed the racing bug and then watching Frankel win the 2000 guineas this year in such amazing style has ensured that something I used to have no interest in watching whatsoever has become a TV channel turner.

Therefore when Frankel won the St Jame's Palace Stakes this year at Royal Ascot I was happy knowing that the AutoBOT I had written had placed a WIN bet on this horse early enough to get a decent price (for what was on offer for an almost 100% guaranteed win).

However when I found out that I had actually lost this bet that my BOT had placed I spent more than a few minutes scratching my head and cursing the PC I was sat in front of. However I found out that the actual market my application had put the bet on was a special WIN market in which the winner had to win by at least 4 clear lengths. Because Frankel had won by less than a length I had lost the bet. I wanted to know why.

I was annoyed.

I was quite pissed off actually and when I looked into it I found that to place a WIN only bet on the main WIN market in Betfair is quite a pain in the arse to achieve if you don't know the Market ID upfront as there is nothing in the compressed data that is given to you to identify that the market is the main WIN market and not some special market such as the one in which I lost that bet in.

Instead all you can do is run through a series of negative tests to ensure that the market is not a PLACE market, a Reverse Forecast or a Horse A versus Horse B market.

In fact since then I have found that there are so many possible markets it can be quite a nightmare to get the right one if you don't already have the Market ID.

For example today at 15:50 there was a race at Ascot, the Betfair Summer Double First Leg International Stakes that actually had alongside the usual markets a FIVE TO BE PLACED and TEN TO BE PLACED market. This was in a race with 23 runners!

The prices were obviously minimal and you would have had to of put down a tenner to win 70p on the favourite Hawkeythenoo but it meant that my original code to identify the main WIN market required updating as it was returning these new market ID's instead of the one that I wanted.

I have outputted the code for my Betfair API Unpack class below and this is just the part of my AutoBOT that returns a Market ID when provided with the compressed string of data that Betfair provides along with the Course name, the market type (WIN or PLACE) and the Race Date and Time.

You will see that I am using LINQ to filter out my data and I am using a custom function in my WHERE clause to return a match. It is this function that is the key as it has to check all the possible Betfair Market types to rule them out when looking for the main WIN market.

If you don't use C# then LINQ is one of the cool tools that makes it such a great language as it enables you to apply SQL like queries to any type of object that extends IEnumerable.

Obviously if you don't bet or don't use Betfair you might be wondering what the heck this has to interest you and you would be right apart from this bit of code being a nice example of how to use LINQ to return a custom list that can be iterated through like any array or list of objects.

Remember: Betfair may introduce even more markets in the future and if anyone knows of any markets I have missed then please let me know as I don't want to lose any more money by accident because of some weird market Betfair decides to trade on.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace BetfairUnpack
{
 // This is my object that holds data about the Betfair market
 public class MarketDataType
 {
     public int marketId;
     public string marketName;
     public string marketType;
     public string marketStatus;
     public DateTime eventDate;
     public string menuPath;
     public string eventHeirachy;
     public int betDelay;
     public int exchangeId;
     public string countryCode;
     public DateTime lastRefresh;
     public int noOfRunners;
     public int noOfWinners;
     public double totalAmountMatched;
     public bool bspMarket;
     public bool turningInPlay;
 }

 public class UnpackMarket
 {

     // Use my own class amd make a list object we can loop through like an array
     public List<MarketDataType> marketData;

     private string BaseDateVal = "1/1/1970";
     private string ColonCode = "&%^@"; // The substitute code for "\:"
     private int DaylightSavings = 3600000;

// This method unpacks a compressed string and returns the correct MarketID filtering by the Course, Date and Market type
     public UnpackMarket(string MarketString, string racecourse, DateTime racedatetime, string marketType)
     {

         string[] Mdata;

    // Betfair uses it's own format and we need to split on a colon
         Mdata = MarketString.Replace(@"\:", ColonCode).Split(':');

    // get our date and time
         DateTime BaseDate = Convert.ToDateTime(BaseDateVal);

         // if we are not currently in daylight savings then set that property to 0 so we get the correct time
    // I have had instances where the correct market is not returned due to Daylight savings time
         if (!DateTime.Now.IsDaylightSavingTime())
         {
             DaylightSavings = 0;
         }

    // Use LINQ on our IEnumerable object to query our list of markets filtering by our custom function MatchMarket
         IEnumerable<MarketDataType> queryMarkets =
             from m in Mdata
             where !String.IsNullOrEmpty(m)
             let field = m.Split('~')
             where (MatchMarket(field[5], BaseDate.AddMilliseconds(DaylightSavings + Convert.ToDouble(field[4])), field[1], racecourse, racedatetime, marketType))
             select new MarketDataType()
             {
                 marketId = Convert.ToInt32(field[0]),
                 marketName = field[1].Replace(ColonCode, ":"),
                 marketType = field[2],
                 marketStatus = field[3],
                 eventDate = BaseDate.AddMilliseconds(DaylightSavings + Convert.ToDouble(field[4])),
                 menuPath = field[5].Replace(ColonCode, ":"),
                 eventHeirachy = field[6],
                 betDelay = Convert.ToInt32(field[7]),
                 exchangeId = Convert.ToInt32(field[8]),
                 countryCode = field[9],
                 lastRefresh = BaseDate.AddMilliseconds(DaylightSavings + Convert.ToDouble(field[10])),
                 noOfRunners = Convert.ToInt32(field[11]),
                 noOfWinners = Convert.ToInt32(field[12]),
                 totalAmountMatched = Convert.ToDouble(field[13]),
                 bspMarket = (field[14] == "Y"),
                 turningInPlay = (field[15] == "Y")
             };

    // convert into a nice easy to iterate list
         marketData = queryMarkets.ToList();

     }

// return a Market if the values provided match
     private bool MatchMarket(string menuPath, DateTime eventDate, string marketName, string racecourse, DateTime racedatetime, string marketType)
     {
         bool success = false;

    // do some cleaning as Betfair's format isn't the prettiest!
         menuPath = menuPath.Replace(ColonCode, ":");
         marketName = marketName.Trim();

         // does the path contain the market abreviation - we keep a list of Courses and their Betfair abreviation code
         if (menuPath.Contains(racecourse))
         {
             // check the date is also in the string
             string day = racedatetime.Day.ToString();
             string month = racedatetime.ToString("MMM");

             // we don't want 15:00 matching 17:15:00 so add :00 to the end of our time
             string time = racedatetime.ToString("HH:mm:ss");

             if (menuPath.Contains(day) && menuPath.Contains(month) && eventDate.ToString().Contains(time))
             {
                 // if no bet type supplied returned all types
                 if (String.IsNullOrEmpty(marketType))
                 {
                     success = true;
                 }
                 else
                 {
                     if (marketType == "PLACE")
                     {
                         // place bet so look for the standard To Be Placed market (change if you want specific markets e.g the 10 Place market = 10 TBP
                         if (marketName.Contains("To Be Placed"))
                         {
                             return true;
                         }
                         else
                         {
                             return false;
                         }
                     }
                     // we can only identify the main WIN market by ruling out all other possibilities if Betfair adds new markets then this
  // can cost us some severe money!
                     else if (marketType == "WIN")                                                                                                                                                                                                                                                  
                     {
      // rule out all the various PLACE markets which seem to go up to ten horses! Just look for TBP e.g 10 TBP or 5 TBP
                         if (marketName.Contains("To Be Placed") || marketName.Contains("Place Market") || marketName.Contains(" TBP"))
                         {
                             return false;
                         }
                         // ignore forecast & reverse forecast and horseA v horseB markets                            
                         else if (marketName.Contains("forecast") || marketName.Contains("reverse") || marketName.Contains(" v ") || marketName.Contains("without ") || marketName.Contains("winning stall") || marketName.Contains(" vs ") || marketName.Contains(" rfc ") || marketName.Contains(" fc ") || marketName.Contains("less than") || marketName.Contains("more than") || marketName.Contains("lengths") || marketName.Contains("winning dist") || marketName.Contains("top jockey") || marketName.Contains("dist") || marketName.Contains("finish") || marketName.Contains("isp %") || marketName.Contains("irish") || marketName.Contains("french") || marketName.Contains("welsh") || marketName.Contains("australian") || marketName.Contains("italian") || marketName.Contains("winbsp") || marketName.Contains("fav sp") || marketName.Contains("the field"))
                         {
                             return false;
                         }
                         else
                         {
                             return true;
                         }
                     }
                     else
                     {
                         // I cannot match anything!
                         return false;
                     }

                 }
             }
         }

         return success;
     }
 }
}

Thursday, 19 April 2012

I was recently asked how much I think I am worth in monetary value to a company and this got me to thinking that in this global economy in which India and China are churning out IT developers at factory rates all willing to put bids in for huge pieces of work on vworker.com for $200 and then spend their whole time on sites like this asking for help I wonder what my readers think someone with a CV like my own would be worth in today's IT environment. My online CV can be found here but here is a cut down version of it.

Overall Skillset

  • Over 15 years of SQL development work. From relational to real-time VLDB's I have designed, developed and maintained systems from MySQL to MS SQL 2008 (and soon SQL 2012)
  • Experience in a wide variety of languages including (in alphabetical order) ASP (classic), ASP.NET, C#, CSS, HTML, Java, JavaScript (server and client side), .NET, PHP, XHTML, XML and VXML.
  • Developed a large number of systems   (200+)  that are still running that use 3 versions of a system I personally developed (back and front end). These systems offer the sometimes unobtainable mix of a high turnaround to increase sales through very short development times, very customisable, good performance and an ease of maintenance that means most of the bugs and setup errors are fixed at the press of a button. These systems currently compete with the market leader in our field at a fraction of the cost.
  • I have a solid grounding in both object orientated and procedural development methodologies.
  • Caching, minification, compression and other optimisation techniques both database and front-end side.
  • JavaScript widget development including creating a number of custom JS widgets as well as creating JavaScript reliant sites that are progressively enhanced.
  • Automated tasks to report, analyse and fix potential issues all without a finger being lifted due to data driven database systems.
  • Good coding practises that can improve old systems written in legacy languages like ASP Classic including how to limit 3rd party COM object and other object re-use, reduction of logging, regular expressions that don't create catastrophic backtracking and other well known but sadly untaught tricks of the trade such as how to debug problems.
  • A search on Google for "Stricty-Software" will show you the wide variety of tools, sites and skills I have available and these include
    1. Developing and now selling my own Windows Applications including the Twitter HashTag Hunter Application that allows new site owners to find the @accounts and #hashtags they should be following and using by scanning Twitter for certain keywords without getting you blocked.
    2. Developing 5 WordPress plugins that have been well received by the SEO community as well as other WordPress users. These include plugins to Automatically add tags to posts imported into site (AutoBlogging) without using 3rd party plugins. Another favourite is the Strictly-TweetBot that allows users to post multiple Tweets to multiple Twitter accounts whenever posts are published. The options for each Tweet include the ability to add tracking codes, content analysis to block or allow the post and the use of tags or categories as #hashtags in the Tweet. I have even fixed problems in other well known WordPress plugins that were key to integrating with WordPress.
    3. Still on the subject of WordPress I have written a 3 part "Survival Guide" for Microsoft programmers new to Linux, Apache, MySQL and PHP get their heads round the many problems WordPress and a LAMP system can throw at you that covers basic SSH terminal commands, Performance tools, Plugins to install and to avoid and security options to prevent your site from being hacked.
    4. Being able to understand, de-construct and find problems in well known and well used frameworks such as jQuery and Prototype. I have also created my own lightweight JavaScript framework Getme.js which offers selectors, chaining, Sizzle compatibility and a few important functions but leaves the majority of the coding up to the developer. This prevents the sometimes annoying choice of having to go "all X framework" or "no framework".
    5. Developing and releasing a large number of free scripts, projects and functions for readers of my www.strictly-software.com site. From HTML Encoders (that encode properly), to SQL performance tuning and SQL injection clean up scripts my site is a key source of information for techies around the world.
  • I am also a developer of a wide range of free online tools which can be found at tools.strictly-software.com including de-packers and reformaters, encoders, compressors, scanners and one of the first online Twitter Translators.
  • I am an expert in Regular Expressions and SQL Injection detection. I also was one of the first people to discover the SQL Denial of Service attack that is possible on certain sites that allow users to enter complex search patterns. I also regularly list common hack vectors and de-encrypt SQL injection attacks so that people know what they are doing.
  • Having to defend critical systems from constant hackbots, scanners, spammers and content scrapers I have over the years become an expert in ways to reduce "bad" traffic through various means including .htaccess rules, trick robot.txt files, free advertising through blocking image hotlinking and using real time data analysis to determine spoofers from humans and BOTS.
  • On the other side of this coin I have also created many SEO tools (both white and blackhat) that include apps for proxy hunting and checking performance, content scraping without overloading servers or being blocked as well developing directive based languages for scraping with ease.
  • I have developed a number of BOTS, Web and Windows Services, as well as writing regular expressions to parse HTML from external sites, creating my own two step CAPTCHA's to beat BOTS and many other techniques.
  • As well as being a keen sportsman into martial arts, badminton and football I am also the creator of the term "Techies Law" and "Job Rapist".
I would suggest taking the time to read my full online CV or a cut down version on LinkedIN before taking a minute out of your day tell me what you think someone with this wide ranging skill set should be paid for an annual salary in the UK.

Remember £1 is about $1.60 or €1.22. If you have another answer please write a comment and remember you can only vote once as the poll is blocked by IP and cookie so you can only vote once! Thank you

Saturday, 9 July 2011

URLEncode Problem with .NET 4.0

Visual Studio 2010 Problem with "The name HttpUtility does not exist in the current context".

I usually use Visual Studio 2010 at work and the Express edition at home. Tonight I was knocking up quick C# windows form application to crawl some specific URL's and I required the ability to URL Encode my URI Components e.g

string url = "http//somesite.com/search?qry=" +  System.Web.HttpUtility.UrlEncode(val);

However even though I had included:

using System.Web;

At the top of my class it was saying that HttpUtility could not be found in System.Web.

This was driving me crazy as I had another project open at the same time which was using the exact same code and it worked.

The advice I found on the web said to ensure that a reference to System.Web was added with the "Project" > "Add References" option.

However when I was searching for this DLL all I could find were System.WebServices and System.ApplicationServices which were obviously no good.

However after a lot of head scratching I went back to the main Project Properties panel and under Target Framework I had ".NET Framework 4 Client Profile" Selected.

I changed this first to .NET Framework 3.5 and then looked again for the reference and it worked!

I added it and then changed it back again to .NET Framework 4. On re-opening the project  all my problems had been solved.

Don't ask me why System.Web doesn't appear as a reference under the Client versions but it doesn't and you need to add a proper DLL reference to get to use this sort of functionality if you are doing non web based apps that require web based functionality like URL or HTML Encode etc.

I thought I would just write a few notes about this in-case anyone else experiences the same problem.

Thursday, 7 April 2011

Moving a C# Windows Service to Windows 7

Problems moving a Windows Service to Windows 7

I have just been given a new PC at work that runs 64bit Windows 7. I have to say so far I quite impressed with the OS and the great thing about Windows 7 is that I can now install IE 9.0 which is blindingly fast and they have finally seen the light and made it standards compliant with support for the DOM 2 event model.

One of the jobs I struggled with today though was making a Windows Service that I had created on my old XP machine in Visual Studio 2010 run on my new box.

The service is basically a BOT that makes SOAP requests behind the scenes to a 3rd party server.

The code was working fine as I had moved it all from a standalone console application that worked but whenever my service harness EXE tried to run I would just get bland _COMPlusExceptionCode errors and nothing of any meaning in the event log.

Debugging a Windows Service was a right pain in the behind and after many uninstalls, installs and lots of shouting I finally got it working.

These steps might not work for everyone but I had to do all of the following to get my service working.

1. Access Rights. Even though I supposedly had administrative rights on my PC I couldn't even install the service with the installUtil [path to exe] command without having to right click on the correct Visual Studio command prompt and choosing "Run as administrator".

However even after doing this I was still having problems starting the service and I was getting the standard

The service did not respond to the start or control request in a timely fashion.

Error message whenever I tried starting the service from the service administrative tool or from my desktop wrapper application.

After some step through debugging I found an "Access Denied" error message was occurring on the ServiceController.Start() method call so I went into the User Account Control Settings and turned the scrolling control right down to "Never Notify". As I wasn't allowed the logon details for the Local System account I was using my own login details for the service as this seemed to be the only way to get round this error message with a reboot.

2. Rebuilding the Service as an x64 solution and not x86. I tried this due to a weird win32 error I was seeing and just put 2 and 2 together and presumed that something was not being run correctly due to it now being on a 64 bit machine.

3. Adding the following code to the config file:

<runtime>
<generatePublisherEvidence enabled="false"/>
</runtime>


I obtained this clue from the following site: stackoverflow.com and it seemed to be the final part of the jigsaw that finally got my service and the wrapper application working on my new machine.

I am fully aware that I am not describing all the reasons behind the decisions I made but I am not exactly aware of the whys and hows at this point in time. However if I come up with further answers I will let you know. Plus if anyone else has any useful links or tips about this issue please add them to the comment section.

Thursday, 10 March 2011

Problems and solutions whilst creating an ASP.NET Web Service

Problems setting up an ASP.NET Web Service

I was recently following a guide to setting up a .NET web service in Visual Studio 2010 and I ran into a few problems.

The guide I was using was Microsoft's own Knowledge Base article which runs through the setting up of a very simple Maths based web service.

Missing Web Service Project Template

The first problem was trying to find the ASP.NET Web Service project template on the start up page of Visual Studio. Step 2 of the guide says:

On the File menu, click New and then click Project. Under Project types click Visual C# Projects, then click ASP.NET Web Service under Templates. Type MathService in the Location text box to change the default name (WebService1) to MathService.
However in Visual Studio 2010 this option was missing. I found out that the reason was due to the default Framework being set to .NET 4.0. The solution is to change the framework to .NET 3.5 and low and behold the Web Service template option was available to select.

I don't know the reasons behind this so don't ask and maybe it's possible to create a web service in .NET 4.0 some other way. If I find out I will update this article.


How to resolve "Could not create type MathServices.service1" error

The second problem I ran into was due to the advice given in step 3 which was to rename the default web service to something else.

Change the name of the default Web service that is created from Service1.asmx to MathService.asmx.
This is fine until you actually build the project and try and access the service on your localhost as it says to do in Step 7.

Browse to the MathService.asmx Web service page to test the Web service. If you set the local computer to host the page, the URL is http://localhost/MathService/MathService.asmx.
When the page loads I was met with an ASP.NET error along the following lines:

Server Error in '/' Application.

Parser Error

Description: An error occurred during the parsing of a resource required to service this request. Please review the following specific parse error details and modify your source file appropriately.

Parser Error Message: Could not create type 'MathService.Service1'.

Source Error:


Line 1: <%@ WebService Language="C#" CodeBehind="MathService.asmx.cs" Class="MathService.Service1" >

Source File: /MathService.asmx Line: 1



The reason behind this was that even though I had changed the name of the service I had not changed the code in the .ASMX file to point to the new codebehind. Therefore you need to right click on the .ASMX file in the project explorer and choose the "View Markup" option. Once that is open just change the code to point to the new class like so.

<@ WebService Language="C#" CodeBehind="MathService.asmx.cs" Class="MathService.MathService" %>


Incorrect location for the web service

The knowledge base guide I was following said that the web service would be located on my local machine at the following location:
http://localhost/MathService/MathService.asmx

However when trying this gave me a 404 error. So instead after building the application I clicked the play button which took me to the following location:
http://localhost:1271/MathService.asmx

As you can see this differs due to the port number and missing sub directory. I am pretty sure I haven't missed any steps out along the way and the namespace for my Web Service was setup correctly so I am not sure why this differs from the example.


Creating a Consumer application for the web service

The previous step was important because the next part of the guide was on creating a consumer console application to make use of the web service.

In Visual Studio you select the "Add Service Reference" option from Project menu and then click the "Advanced" button.

On the next page choose the "Add Web Reference" button at the bottom and then on the next page you need to enter the location of your web service in the "URL" box at the top and then hit the little green arrow to the side of it. This will search for the reference at the specified location.

Make sure you know where the web service is located before trying this as the location of my own local web service was not where the knowledge base article said it would be:
http://localhost/MathService/MathService.asmx
instead it was located at
http://localhost:1271/MathService.asmx
Once the web service is found you can add it in the normal manner.

After that change, a re-save and a re-build everything worked fine.

Hopefully this might help others out there who are following the same guide if they run into the same problems.


Monday, 1 December 2008

Adding Remote URL content to Index

Indexing a remote URL for use in a knowledge base

I have just completed work on a small knowledge base that I built in ASP.NET which consisted of a few quite funky features one of which was the ability to add an article into the system that was at a remote location. Most of the articles revolve around written content or files which are attached to the articles but sometimes users may come across an article on the web that they think would be great to add to the system and want it to be indexed and searchable just like any other article. In my previous incarnation of this which I hastily had written one night back in the late 90's in classic ASP you could add a URL but the only indexable content that could be used to find it in the knowledge base was the tag words I allowed the user to add alongside the URL. Obviously this isn't really good enough so in the latest version on saving the article I do the following:

  1. Check the URL looks valid using a regular expression.
  2. Access the URL through a proxy server and return the HTML source.
  3. Locate and store the META keywords, description and title if they exist.
  4. Remove everything apart from content between the start and close BODY tags.
  5. From the body I strip any SCRIPT tags and anything between them.
  6. Remove all HTML tags.
  7. Clean the remaining content by removing noise words, numbers and swear words.
  8. I add the remaining content which consists of good descriptive wording to the META keywords, description and title which I stored earlier.
  9. I save this content to the database which then updates the Full Text Index so that it becomes searchable by the site users.

Following this process means that I get all the benefits of having the remote article indexed and searchable without the downside of having to store the whole HTML source code. After cleaning I am left with only the core descriptive wording that is useful and do away with all the rubbish.

I will show you the two main methods that retrieve the URL content and cleans the source which I have done using C#.


1. Method to access remote URL through proxy server.




public static string GetURLHTML(string remoteURL, string proxyServer)
{
string remoteURLContent = "";

WebProxy proxy = new WebProxy(proxyServer, true); //pass the name of the proxy server
WebRequest webReq = WebRequest.Create(remoteURL);
webReq.Proxy = proxy; //set request to use proxy

// Set the HTTP-specific UserAgent property so those sites know whos come and ripped them up
if (webReq is HttpWebRequest)
{
((HttpWebRequest)webReq).UserAgent = ".NET Framework Strategies Knowledge Base Article Parser v1.0"; //Set up my useragent
}

WebResponse webResp;
int responseStatusCode = 0;

try{
// Get the response instance
webResp = (HttpWebResponse)webReq.GetResponse();

// Read an HTTP-specific property.
if (webResp is HttpWebResponse)
{
responseStatusCode = (int)((HttpWebResponse)webResp).StatusCode;
}
}catch(Exception ex){
return remoteURLContent;
}

//we can only collect HTML from valid responses so ignore 404s and 500s
if (responseStatusCode != 200)
{
return remoteURLContent;
}

// Get the response stream.
Stream respStream = webResp.GetResponseStream();

StreamReader reader = new StreamReader(respStream, Encoding.ASCII);
remoteURLContent = reader.ReadToEnd();

// Close the response and response stream.
webResp.Close();

return remoteURLContent;
}



The reason I use a proxy is down to the security policy set on our web servers.


2. Method to gather the main content.



//When article poster wants us to save a remote URL as the KB article content then we need to get the content and parse it
protected string IndexURL(string remoteURL)
{
KeywordParser keywordParser;
string METAKeywords = "", METADescription = "", METATitle = "";
string cleanHTML = "";
StringBuilder indexText = new StringBuilder();

//As I have to access all remote URLs through a proxy server I access my application setting from the web.config file
string proxyServer = ConfigurationManager.AppSettings["ProxyServer"].ToString();

//now access the remote URL and return the HTML source code if we can
string remoteURLHTML = UtilLibrary.GetURLHTML(remoteURL, proxyServer);

//if we have some HTML content to parse and clean
if (!String.IsNullOrEmpty(remoteURLHTML))
{
remoteURLHTML = remoteURLHTML.ToLower(); //lower case it all as a)it doesn't matter and b)means no need for ignore options in regular expressions

//Set up some regular expressions to help identify the META conent we want to index in the source
Regex HasKeywords = new Regex("<meta\\s+name=\"keywords\"");
Regex HasDescription = new Regex("<meta\\s+name=\"description\"");
Regex HasTitle = new Regex("<title>");

//As I am using replaces to quickly return the content I require I do a test first for the relevant tag otherwise if the source doesn't
//contain the META tag then we will be left with the whole HTML source which we obviously don't want!!
if (HasKeywords.IsMatch(remoteURLHTML))
{
//get the data we require by replacing anything either side of the tag
METAKeywords = "KEYWORDS = " + Regex.Replace(remoteURLHTML, "((?:.|\n)+?<meta\\s+name=\"keywords\"\\s+content=\")(.+)(\"(?:.|\n)+)", "$2");
}
if (HasDescription.IsMatch(remoteURLHTML))
{
METADescription = "DESCRIPTION = " + Regex.Replace(remoteURLHTML, "((?:.|\n)+?<meta\\s+name=\"description\"\\s+content=\")(.+)(\"(?:.|\n)+)", "$2");
}
if (HasTitle.IsMatch(remoteURLHTML))
{
METATitle = "TITLE = " + Regex.Replace(remoteURLHTML, "((?:.|\n)+?<title>)(.+)(<\\/title>(?:.|\n)+)", "$2");
}

cleanHTML = remoteURLHTML;

//now get main content which is between open close body tags
cleanHTML = Regex.Replace(cleanHTML, "((?:.|\n)+?<body.*?>)((?:.|\n)+?)(<\\/body>(?:.|\n)+)", "$2");

//strip any client side script by removing anything between open and close script tags
cleanHTML = Regex.Replace(cleanHTML, "<script.*?</script>", "", RegexOptions.Singleline | RegexOptions.IgnoreCase);

//put a gap before words that appear just before closing tags so that we keep gaps between values from listboxes
cleanHTML = Regex.Replace(cleanHTML, "(\\w)(<\\/\\w)", "$1 $2");

//strip HTML tags
cleanHTML = Regex.Replace(cleanHTML, "<[^>]+?>", "");

//Decode the HTML so that any encoded HTML entities get stripped
cleanHTML = HttpUtility.HtmlDecode(cleanHTML);

//now add all the content we want to index back together
if (!String.IsNullOrEmpty(METAKeywords))
{
indexText.Append(METAKeywords + " ");
}
if (!String.IsNullOrEmpty(METADescription))
{
indexText.Append(METADescription + " ");
}
if (!String.IsNullOrEmpty(METATitle))
{
indexText.Append(METATitle + " ");
}
if (!String.IsNullOrEmpty(cleanHTML))
{
indexText.Append(cleanHTML);
}

}

return indexText.ToString();
}


I have left out the other function that strips noise words, numbers and swear words as its nothing special just a couple of loops that check some arrays containing the noise words that need removing.

The performance of this method varies slightly depending on the size of the content that is being parsed. Also its possible to leave in the content any noise words and numbers as these will not get added to any Full Text Index anyway as SQL Server will automatically ignore most noise words and numbers. However if data storage is an issue you may still want to do this so that you only save to the database table core content.