Extending Core C# Object Try/Catch To Display Custom Regular Expression Exceptions - Part 1
By Strictly-Software
Have you ever wanted to throw custom exception errors whether for business logic such as if a customer isn't found in your database, or whether more complex logic has failed that you don't want to handle but raise so that you know about it.
For me, I required this solution due to a BOT I use that collects information from various sites every day and uses regular expressions to find the pieces of data I require and extract them. The problem is that the site often changes its HTML source code and it will break a regular expression I have specifically crafted to extract the data using it.
For example, I could have a very simple regular expression that is getting the ordinal listing of a name using the following C# regular expression:
string regex = @"<span class=""ordinal_1"">(\d+?)</span>";
Then one day I will run the BOT and it won't return any ordinals for my values and when I look inside the log file I find it will have my own custom message due to finding no match such as "Unable to extract value from HTML source" and when I go and check the source it's because they have changed the HTML to something like this:
<span class="ordinal_1__swrf" class="o1 klmn">"1"<sup>st</sup></span>
This is obviously gibberish many designers add into HTML to stop BOTS crawling their data hoping that the BOT developer has written a very specific regex that will break and return no data when met with such guff HTML.
Obviously, the first expression whilst meeting the original HTML alright is too tight to handle extra guff within the source.
Therefore it required a change in the expression to something a bit more flexible in case they added even more guff into the HTML:
string regex = @"<span class=""ordinal_+?[\s\S]+?"">""?(\d+?)""?<sup";
As you can see I have made the expression looser, with extra question marks ? in case quotes are or are not wrapped around values, and using non-greedy match any character and non-character expressions like [\s\S]+? to handle the gibberish from the point it appears to where I know it has to end at the closing quote or bracket.
So instead of just logging the fact I have missing data from crawls I wanted to raise the errors with TRY/CATCH and make the fact that a specific piece of HTML no longer matches my expression an exception that will get raised so I can see it as soon as it happens.
Well with C# you can extend the base object TRY/CATCH so that your own exceptions based upon your own logic can be used. In future articles, we will build up to a full C# Project with a couple of classes, a simple BOT and some regular expressions we can use to test what happens when trying to extract common values from the HTML source on various URL's to throw exceptions.
Creating The TRY CATCH CLASS
First off the TRY / CATCH C# Class where I am extending the base object to call, usual messages but using String,Format so that I can pass in specific messages. I have put numbers at the start of each method so that when running the code later we can see which exceptions get called.
I have just created a new solution in Visual Studio 2022 called TRYCATCH and then named my RegExException that extends the Exception object. You can call your solution what you want but for a test, I don't see why just following what I am doing is not okay.
[Serializable]
public class RegExException : Exception
{
public RegExException() { }
public RegExException(string regex, string msg)
: base(String.Format("1 Regular Expression Exception: Regular Expression: {0}; {1}", regex, msg )) { }
public RegExException(string msg)
: base(String.Format("2 {0}", msg)) { }
public RegExException(string regex, Exception ex)
: base(String.Format("3 Regular Exception is no longer working: {0} {1}", regex,ex.Message.ToString())) { }
}
As you can see the Class has 3 overloaded methods which either just take a single message, a regular expression string and a message, and a regular expression and an exception these values are placed in specific places within the exception messages.
You would cause a Regular Expression Exception to be thrown by placing something like this in your code:
throw new RegExException(String.Format("No match could be found with Regex {0} on supplied value '{1}'", re, val));
Hopefully, you can see what I am getting at, and we will build on this in a future post.
© 2022 By Strictly-Software
No comments:
Post a Comment