Wednesday 23 January 2019

SQL Regular Expressions - Extracting Income From PayPal IPN Data

Extracting Payment Amounts From PayPal IPN Data Using SQL 

By Strictly-Software

Recently I had to come up with a way on one of my sites,, to find the total amount of income received by certain members from their PayPal subscription information. 

This is all stored in an MS SQL Payments table along with the Member ID, the PayPal IPN data and Payment type e.g P for Payment, C for Cancelled, W for Waiting, E for EOT etc.

I also have a table of Member Subscriptions which stores details of every subscription they have ever had including their Member ID, a custom GUID to identify the subscription as they could have had multiple subscriptions. Also I store their PayPal Subscription ID and details of their Sign Up Date, Cancellation Date and Next Payment Date alongside other relevant information.

If you don't know about PayPal and how to make a payment system using their shopping cart or a custom payment system then you can read up about it on their site here.

However the main thing to know is that when payments are made, recurring subscriptions started or cancelled then PayPal will send your site an Instant Payment Notification (IPN) to a callback page for you to handle.

This IPN data is a long string of text including the payers name, email, address, recurring payment information such as the number of days payments are made, plus any custom values you may need for your site such as a unique identifier (GUID).

This is useful in website systems so that you can save the new member in a database table with an ID before they go off to PayPal. When they return to your site this information can be sent back to your IPN callback page so that you can analyse the IPN data and ensure it's a valid callback attempt by making a handshake with PayPals system. Once you know it's valid information you can link the callback and IPN data string to a member by matching their Member ID with the one you stored earlier in the database.

An example of an IPN string looks like the text below and contains all the information about an IPN. There are many types of IPN strings such as those that handle cancellations, waiting, EOT (End Of Term) etc but the one I am interested in that contains two bits of information I require is the Payment Notification. 

You can see the two pieces of information in the IPN text below as they are highlighted in blue and red. The start of the string contains one part and the other is halfway through the data.


As you can see the amount I want is in the first part is at the beginning of the string, just after mc_gross e.g mc_gross=30.00.

However due to legacy issues of the system and I also need to connect the users PayPal subscription ID to the one I store in the members subscription table.

This is highlighted further along in the string e.g subscr_id=I-PLD46W039GB.

An example of the CLR User Defined Function I use to allow for Regular Expression Replacements can be seen below. It is connected to a DLL that contains the C# code that runs the regular expression replacing, however that is another topic altogether. You just need to know the format of the parameters for the SQL I show you later on.

Personally I store this function in the MSDB system database so that all my databases on the server can utilise it for regular expression replacements.

ALTER FUNCTION [dbo].[udf_SQLRegExReplace](@Pattern [nvarchar](500), @MatchString [nvarchar](max), @ReplaceString [nvarchar](2000))
EXTERNAL NAME [asbl_SQLRegExpr].[UserDefinedFunctions].[RegExReplace]

The SQL code I use for totaling up all amounts for a certain member is below.

I use two embedded SQL UDF Replacement Functions to extract just the monetary amount and remove everything after it. I also use the same function to join the Subcriber ID in my table to the subscr_id value in the string. 

This involves using a sub-query that does the cleaning and replacing and returns the monetary value. Then an outer query uses that value and a SUM function to total all the values up.

  SELECT CAST(msdb.dbo.udf_SQLRegExReplace('&.+?$',msdb.dbo.udf_SQLRegExReplace('^.*mc_gross=',PayPalDetails,''),'') as money) as val 
  FROM MEMBERS_PAYMENTS as p with (nolock) 
  JOIN MEMBER_SUBSCRIPTIONS as ms with (nolock) 
   ON ms.MemberFK = p.MemberFK
    AND ms.PaypalSubscriptionID = dbo.udf_SQLRegExReplace('&.+?$',dbo.udf_SQLRegExReplace('^.*subscr_id=',PayPalDetails,''),'')
    AND ms.MemberFK = 10342
  WHERE PaymentStatus = 'P'
 ) as t 

As you can see the SQL function calls are wrapped in an inner SQL statement so that it just returns payment values and then the outer SQL statement returns the SUM amount for a particular member ID value.

This is a good example of having to use SQL CLR Regular Expressions as well as the use of embedded functions to extract hard to get values from a long complicated string of data.

You could attempt to write the regular expression calls so that only one function call is made for the monetary amount and another one for the subscriber ID. However the IPN strings are not always similar in format and sometimes it is better for performance to break your regular expressions down into multiple replacement calls instead of writing a complicated expression to cover all possible formats.

Try it and see for yourself. See if you can re-write the query with one regular expression per extraction, and then compare the performance of the two queries with SQL Performance Monitor.

By Strictly-Software

© 2019 Stictly-Software


Paul Bowers said...

Great article, your regular expression skills are spot on. I would change the parameters in the UDF though so that they match those in C#, VB, ASP etc as you usually get the string input parameter first, then the expression then replacement value or grouping e.g "$1$3" etc.

Dave Elliot said...

Love it, I have always had issues trying to extract parameters out of IPN strings from PayPal and usually use code to split on the = like a querystring and make an array before accessing a param by name to get the second array part which is the value. However this is long winded and if you are not storing the actual payment amount as a seperate column value in the Payments table then I suppose using a regular expression is the only way to go about getting it out and getting a total amount for a member, date range or IPN type (cancellation total amount for example).

Rob Reid said...

Paul, yes I should change it around to be Microsoft compliant really but the params of the UDF have to match those of the DLL Class and I have lost that code unfortunately. I could re-do it obviously but I think at the time I was working a lot on PHP and they have functions with params all over the place back n forth so I think that's why they are in that order

Rob Reid said...

Yes, really I should normalise the Payments table a bit more and have a column called Amount but in reality there was no need for it until recently when I needed to write a report for totaling up amounts from the table for certain promo codes. If I had needed an amount column when first creating the code and site I would have done so.