Thursday 11 August 2022

Testing For Internet Connectivity

Setup Measures For A Big API System

By Strictly-Software

I have a BOT that runs nightly, and in the SetUp() methods to ensure everything is okay, it runs a number of tests before logging to an API table in my database. This is so I know if the API I am using is available, whether I needed to login to it, when the last job was started and stopped, and when the last time I did log into the API. 

It just helps in the backend to see if there are any issues with the API without debugging it. Therefore the SetUp job has to be quite comprehensive.

The things I need to find out, which are logged into the logfile.log I keep building until the job is finished with major Method Input Params, Return Params, Handled Errors, Changed System Behaviour, SQL Statements that return an error e.g timeout, an unexpected divide by zero error and the like are the following from my SetUp job.

1. Whether I can access the Internet, this is done with a test to a page that just returns an IPv4 or IPv6 address. The major HTTP Status code I look for there is a 200 status code. If I get nowhere but get a status code of 1 that is a WebExceptionStatus.NameResolutionFailure, which means the DNS could not find the location and it's probably due to either your WIFI button being turned off (Flight Mode), or your Network Adapter having issues.

I test for web access with an obvious simple HTTP test to an IP page that returns my address, with 2 simple regular expressions that can ID if it's IPv4 or IPv6.


public string TestIPAddress(bool ipv4=true)
{
    string IPAddress = "";
    // if we are using an IPv6 address the normal page will return it if we want to force
    // get an IPV4 address then we go to the IPv4 page which is the default behaviour due to
    // VPNS and Proxies using IPV4 usually
    string remotePageURL = ((ipv4) ? "https://ipv4.icanhazip.com/" : "https://icanhazip.com/");
   
    this.UserAgent = this.BrainiacUserAgent; // use our BOT UserAgent as no need to spoof
    this.Referer = "LOCALHOST"; // use our localhost as we are running from our PC
    this.StoreResponse = true; // store the reponse from the page
    this.Timeout = (30000); // ms 30 seconds            

    // a simple BOT that just does a GET request I am sure you can find a C# BOT example on my blog or tons on the web
    this.MakeHTTPRequest(remotePageURL);            

    // if status code is between 200 & 300 we should have data
    if (this.StatusCode >= 200 && this.StatusCode < 300)
    {
		IPAddress = this.ResponseContent; // get the IP Address

		// quick test for IPAddress Match, IPv4 have 3 dots
		if(IPAddress.CountChars('.')==3)
		{
		   // An IpV4 address can replace dots and if all numbers then it is IPv4
		   string ip2 = IPAddress.Replace(".", "");
		   if(Regex.IsMatch(ip2, @"^\d+$"))
		   {
				this.HelperLib.LogMsg("IP Address is IPv4: " + IPAddress + ";","MEDIUM"); // log to log file
		   }
		}
		// otherwise if it contains : semi colons (and nos and certain HEX letters)
		else if (IPAddress.Contains(':'))
		{
		   // could be an IpV6 address check only numbers and letters when colons are removed                    
		   if (Regex.IsMatch(IPAddress, @"^[:0-9A-F]+$"))
		   {
				this.HelperLib.LogMsg("IP Address is IPv6: " + IPAddress + ";","MEDIUM"); // log to log file
		   }
		}
    }
    else
    {
		// no response, flight mode button enabled?
		this.HelperLib.LogMsg("We could not access the Internet URL: " + remotePageURL + "; Maybe Flight Mode is on or Adapter is broken and needs a refresh", "LOW");

		IPAddress = "No IP Address returned; HTTP Errror: " + this.StatusCode.ToString() + "; " + this.StatusDesc + ";";

		// call my IsNetworkOn function to test we have a network
		if(this.IsNetworkOn())
		{
		   this.LastErrorMessage = "The Internet Network Is On but we cannot access the Internet";
		}
		else
		{
		   this.LastErrorMessage = "The Internet Network Is Off and we cannot access the Internet";
		}

		this.HelperLib.LogMsg(this.LastErrorMessage,"LOW"); // log error

		// throw the error will be caught in a lower method and stop the script
		throw new System.Exception(this.LastErrorMessage);
    }
}


If that fails then I run this method to see if the Network is available. You can see in the code above that I call it if there is no 200-300 HTTP status response or IP address returned. 

I test it by running the console script with and without the flight mode button on my laptop.

public bool IsNetworkOn()
{
bool IsNetworkOn = System.Net.NetworkInformation;

NetworkInterface.GetIsNetworkAvailable();

return IsNetworkOn;
}
I also have a series of Proxy addresses I can use to get round blocks on IP addresses although I do try and use Karmic Scraping e.g no hammering of the site, caching pages I need to prevent re-getting them, gaps between retries if there is a temporary error where I change the referer, user-agent and proxy (if required), before trying again after so many seconds.

Also before I turn on a global setting in my HelperLib class which is the main class all other objects refer to and pass from one to another, I test to see if the computer I am on is using a global proxy or not. I do an HTTP request to a page that returns ISP info, and I have a list of ISP's which I can check

If my ISP comes back with the right ISP for my home, I know I am not on a global proxy, also it tells me whether the IP address from earlier is IPv4 or IPv6, then I know I am not using a global proxy. If Sky or Virgin is returned I know I have my VPN enabled on my laptop and that I am using a global proxy.

If I am using a global proxy all the tests and checks for "getting a new random proxy:port" when doing an HTTP request are canceled as I don't need to go through a VPN and then a proxy. If the Global Proxy is turned off I might choose to get a random proxy before each HTTP call. 

If that setting is enabled in my Settings class. As you can't have config page in console scripts that hook into a DLL, my code in the DLL for my settings is just a class called Settings with a big Switch Statement where I get the setting I need or set it.

Once I have checked my Internet access I then do an HTTP call to the Betfair API Operational page which tells me if the API is working or not. If it isn't then I cannot log in into it. If it is working I do a Login test to ensure I can login this involves:
  • Using 1 of 2 JSON methods to login to the 2 endpoints that give me API access with a hashed API code, and my username & password.
  • If I can log in, I update the API table in the DB with the time I logged in and that I am logged in. There may be times I don't need to login to the API such as when I am just scraping all the Races and Runner info.
  • I then do a Session re-get, I hold my session in a file so I don't have to re-get a new one each time but on a run I like to get a new Session and extract the session code from the JSON return to pass back in JSON calls to get data etc.
  • I then do a database test using my default connection string I just SELECT TOP(1) FROM SYS.OBJECTS and ensure I get a response.

Once I know the API is operational, the DB connectivity is working and that I can scrape pages outside the Betfair API I can set all the properties such as whether to use a random proxy/referer/user-agent if one of my calls errors (with a 5 second wait), if I am using a global proxy I don't bother with the proxies.

Believe it or not that is a lot of code across 5 classes just to ensure everything is okay on running the job from the console or Windows Service

I throw my own errors if I cannot HTTP scrape, not just log them, so I can see what needs fixing quickly, and I create a little report with the Time, BOT version, Whether it is live, testing, placing real bets, running from a service or console and other important info I might need to read quickly.

So this is just a quick article on how I go about ensuring a project using a big DLL (with all the HTTP, HelperLib, API etc) code is working when I run it.

Who knows you might want to do similar checks on your large automated systems. 

Who knows. I am just making bank from my AutoBOT which TRADES on Betfair for me!

© 2022 Strictly-Software

Wednesday 3 August 2022

Extending Try/Catch To Handle Custom Exceptions - Part 2

Extending Core C# Object Try/Catch To Display Custom Regular Expression Exceptions - Part 2

By Strictly-Software

This is the second part of our custom Try/Catch overwrite program that is aimed to catch Regular Expression errors that would normally be thrown up by !IsMatch (doesn't match) and so on.
 
So why would you want to do this when the C# .NET RegEx functions already return a true or false or 0 matches etc which you could use for error detection?
 
Well, you can read the first article above this one for info however you may have a case where you have a BOT that is automatically running every day all day on certain sites where the HTML has been worked out and expression especially made to match them. 

If for example the HTML source changes and the piece of data you are collecting suddenly does not get returned anymore as your expression does not match anything instead of just logging to a file which means scanning and checking for errors once you notice your fully automated 24/7/365 system has stopped working 2 months later. 

Therefore instead of just logging the results of IsMatch you may want to throw up an error and stop the program so that a new expression can be made to match the HTML ASAP without lots of data being missed.
 
If you want more info on the Try/Catch class, read the first example article. Here we build just a basic BOT that is going to utilise the RegExException class we made last time. It is not a perfect BOT, you may get some warnings come up, I did it in VS 2021, and it's a specific BOT as its aims are to get the META Title, Description, and the full source of the URL that is passed into it. 

Any errors with the regular expressions not matching anymore are thrown up using our RegExException TryCatch cobject so that they detail the expression, text, or whichever of the 3 methods we made. If you want to use it as the basis of your own BOT you can but has been specifically designed to demonstrate this RegEx TryCatch example so you may want to edit or remove a lot of content first before building back on top of it if you want to make your own BOT in C#.
 
So here is the BOT

public class HTML
{
	public string URL  // property
	{ get; set; }

	public string HTMLSource  
	{ get; set; }

	public string HTMLTitle
	{ get; set; }

	public string HTMLDesc
	{ get; set; }

	public void GetHTML()
	{
	    string HTMLSource = "";
	    string HTMLTitle = "";
	    string HTMLDesc = "";

	    // if no URL then all properties are blank
	    if (String.IsNullOrWhiteSpace(this.URL))
	    {
			HTMLTitle=HTMLDesc = HTMLSource = "";
			return;
	    }

	    try
	    {
			HttpWebRequest request = (HttpWebRequest)WebRequest.Create(this.URL);
			HttpWebResponse response = (HttpWebResponse)request.GetResponse();

			if(response.StatusCode == HttpStatusCode.Forbidden)
			{
		    	throw new Exception(String.Format("We are not allowed to visit this URL {0} 403 - Forbidden", this.URL));
			}
			else if(response.StatusCode == HttpStatusCode.NotFound)
			{
		    	throw new Exception(String.Format("This URL cannot be found {0} 404 - Not Found", this.URL));
			}
            // 200 = OK, we have a response to analyse
            else if (response.StatusCode == HttpStatusCode.OK)
            {
                Stream receiveStream = response.GetResponseStream();
                StreamReader readStream = null;
                if (response.CharacterSet == null)
                {
                    readStream = new StreamReader(receiveStream);
                }
                else
                {
                    readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
                }

                string source = this.HTMLSource = readStream.ReadToEnd();

                if (!String.IsNullOrWhiteSpace(source))
                {
                    // extract title with a simple regex                        
                    string reTitle = @"^[\S\s]+?<title>([\S\s]+?)</title>[\S\s]+$";
                    // Get the match for META TITLE if we can using a wrapper method that takes a regex string, source string and group to return
                    string retval = this.GetMatch(reTitle, source, 1);
                    if (!String.IsNullOrWhiteSpace(retval))
                    {
                        this.HTMLTitle = retval;
                    }
                    else // failed to find the <title>Page Title</title> in the HTML source so throw a Regex Exception
                    {
                        throw new RegExException(reTitle, new Exception(String.Format("No match could be found for META Title with {0}", reTitle)));
                    }

                    // META DESCRIPTION
                    string reDesc = @"^[\s\S]+?<meta content=[""']([^>]+?)['""]\s*?name='description'\s*?\/>[\s\S]*$";
                    // Get the match for META DESC if we can using a wrapper method that takes a regex string, source string and group to return
                    retval = this.GetMatch(reDesc, source, 1);
                    if(!String.IsNullOrWhiteSpace(retval))
                    {
                        this.HTMLDesc = retval;
                    }
                    else // failed to find the <title>Page Title</title> in the HTML source so throw a Regex Exception
                    {
                      throw new RegExException(reDesc, new Exception(String.Format("No match could be found for META Description with {0}", reDesc)));
                    }
                }

                response.Close();
                readStream.Close();
            }
	    }
	    catch(WebException ex)
	    {
          	// handle 2 possible errors
          	if(ex.Status==WebExceptionStatus.ProtocolError)
          	{
              Console.WriteLine("Could not access {0} due to a protocol error", this.URL);
          	}
          	else if(ex.Status==WebExceptionStatus.NameResolutionFailure)
          	{
              Console.WriteLine("Could not access {0} due to a bad domain error", this.URL);
          	}
	    } // catch anything we haven't handled and throw so we can code in a way to handle it
	    catch (Exception ex)
	    {
			throw;
	    }
	}


	private string GetMatch(string re,string source,int group)
	{
	    string ret = "";

	    // sanity test
	    if (String.IsNullOrWhiteSpace(re) || String.IsNullOrWhiteSpace(source) || group == 0)
	    {
			return "";
	    }
	    // use common flags these could be passed in to method if need be
	    Regex regex = new Regex(re, RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);

	    if (regex.IsMatch(source))
	    {
          	MatchCollection matches = regex.Matches(source);

          	Console.WriteLine(String.Format("We have {0} matches", matches.Count.ToString()));

		foreach (Match r in matches)
		{
		    if (r.Groups[group].Success)
		    {
				ret = r.Groups[group].Value.ToString().Trim();
				break;
		    }
		}                
	    }
	    // return matched value
	    return ret;
	}
}
 
So as you can see this HTML Class is my basic BOT. It has 4 properties, URL (which is just to set the URL of the code to get) then: HTMLSource, HTMLTitle, and HTMLDesc which as you can see are all abbreviations of the 3 main bits of the content we want from our BOT, the META Title, META Description, and the whole HTML source. We set these using the code in our HTML classes GetHTML() which does the main work but uses another method GetMatch() to try and get the match of the parameters passed in e.g the expression, the source code to get any match from, and the no of the match to get.
 
For example, there may be bad XHTML/HTML all over the web where people have used ID"s multiple times or put elements that should only exist once in multiple times. With this last parameter, you can ensure you are getting the match you want, usually 1 for the first.
 
Hopefully, you can see what the class is doing and how the results of the IsMatch method can fire off a custom RegExException or not and how it is formatted.
 
In the next article, we will show the content that uses this class to return data and put it all together
 
 
© 2022 By Strictly-Software