Wednesday 23 January 2019

SQL Regular Expressions - Extracting Income From PayPal IPN Data

Extracting Payment Amounts From PayPal IPN Data Using SQL 

By Strictly-Software

Recently I had to come up with a way on one of my sites, www.fromthestables.com, to find the total amount of income received by certain members from their PayPal subscription information. 

This is all stored in an MS SQL Payments table along with the Member ID, the PayPal IPN data and Payment type e.g P for Payment, C for Cancelled, W for Waiting, E for EOT etc.

I also have a table of Member Subscriptions which stores details of every subscription they have ever had including their Member ID, a custom GUID to identify the subscription as they could have had multiple subscriptions. Also I store their PayPal Subscription ID and details of their Sign Up Date, Cancellation Date and Next Payment Date alongside other relevant information.

If you don't know about PayPal and how to make a payment system using their shopping cart or a custom payment system then you can read up about it on their site here.

However the main thing to know is that when payments are made, recurring subscriptions started or cancelled then PayPal will send your site an Instant Payment Notification (IPN) to a callback page for you to handle.

This IPN data is a long string of text including the payers name, email, address, recurring payment information such as the number of days payments are made, plus any custom values you may need for your site such as a unique identifier (GUID).

This is useful in website systems so that you can save the new member in a database table with an ID before they go off to PayPal. When they return to your site this information can be sent back to your IPN callback page so that you can analyse the IPN data and ensure it's a valid callback attempt by making a handshake with PayPals system. Once you know it's valid information you can link the callback and IPN data string to a member by matching their Member ID with the one you stored earlier in the database.

An example of an IPN string looks like the text below and contains all the information about an IPN. There are many types of IPN strings such as those that handle cancellations, waiting, EOT (End Of Term) etc but the one I am interested in that contains two bits of information I require is the Payment Notification. 

You can see the two pieces of information in the IPN text below as they are highlighted in blue and red. The start of the string contains one part and the other is halfway through the data.

mc_gross=30.00&protection_eligibility=Eligible&address_status=confirmed&payer_id=NW2XBXPZ6C78W&address_street=Some+Street&payment_date=13%3A27%3A02+Jan+17%2C+2019+PST&payment_status=Completed&charset=windows-1252&address_zip=D02TX83&first_name=Paul&mc_fee=1.37&address_country_code=IE&address_name=Paul+Hickey¬ify_version=3.9&subscr_id=I-PLD46W039GB&custom=%7B7CD66296-0664-4A51-9ACD-05FE46821D44%7D&payer_status=verified&business=rob%40somesite.com&address_country=Ireland&address_city=Dublin&verify_sign=AczUU94BLMilZ9uHs3gDJVFDFmnrAhoedso-UI.71KEbRJ9deMwoa8KS&payer_email=paulhickey1234%40hotmail.com&txn_id=91X75169EH810362V&payment_type=instant&last_name=Hickey&address_state=DUBLIN&receiver_email=rob%40somesite.com&payment_fee=&receiver_id=5Z7P6UFG56B3P&txn_type=subscr_payment&item_name=Strictly+Software&mc_currency=GBP&item_number=%7B24C686D0-BBF5-54B6-A1D6-215AP099CD43%7D&residence_country=IE&transaction_subject=Membership&payment_gross=&ipn_track_id=98d91af71234

As you can see the amount I want is in the first part is at the beginning of the string, just after mc_gross e.g mc_gross=30.00.

However due to legacy issues of the system and I also need to connect the users PayPal subscription ID to the one I store in the members subscription table.

This is highlighted further along in the string e.g subscr_id=I-PLD46W039GB.

An example of the CLR User Defined Function I use to allow for Regular Expression Replacements can be seen below. It is connected to a DLL that contains the C# code that runs the regular expression replacing, however that is another topic altogether. You just need to know the format of the parameters for the SQL I show you later on.

Personally I store this function in the MSDB system database so that all my databases on the server can utilise it for regular expression replacements.

SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER OFF
GO
ALTER FUNCTION [dbo].[udf_SQLRegExReplace](@Pattern [nvarchar](500), @MatchString [nvarchar](max), @ReplaceString [nvarchar](2000))
RETURNS [nvarchar](max) WITH EXECUTE AS CALLER
AS 
EXTERNAL NAME [asbl_SQLRegExpr].[UserDefinedFunctions].[RegExReplace]

The SQL code I use for totaling up all amounts for a certain member is below.

I use two embedded SQL UDF Replacement Functions to extract just the monetary amount and remove everything after it. I also use the same function to join the Subcriber ID in my table to the subscr_id value in the string. 

This involves using a sub-query that does the cleaning and replacing and returns the monetary value. Then an outer query uses that value and a SUM function to total all the values up.


SET ANSI_NULLS OFF
SELECT SUM(VAL) as Amount
FROM (
  SELECT CAST(msdb.dbo.udf_SQLRegExReplace('&.+?$',msdb.dbo.udf_SQLRegExReplace('^.*mc_gross=',PayPalDetails,''),'') as money) as val 
  --*
  FROM MEMBERS_PAYMENTS as p with (nolock) 
  JOIN MEMBER_SUBSCRIPTIONS as ms with (nolock) 
   ON ms.MemberFK = p.MemberFK
    AND ms.PaypalSubscriptionID = dbo.udf_SQLRegExReplace('&.+?$',dbo.udf_SQLRegExReplace('^.*subscr_id=',PayPalDetails,''),'')
    AND ms.MemberFK = 10342
  WHERE PaymentStatus = 'P'
 ) as t 
WHERE ISNUMERIC(t.VAL)=1

As you can see the SQL function calls are wrapped in an inner SQL statement so that it just returns payment values and then the outer SQL statement returns the SUM amount for a particular member ID value.

This is a good example of having to use SQL CLR Regular Expressions as well as the use of embedded functions to extract hard to get values from a long complicated string of data.

You could attempt to write the regular expression calls so that only one function call is made for the monetary amount and another one for the subscriber ID. However the IPN strings are not always similar in format and sometimes it is better for performance to break your regular expressions down into multiple replacement calls instead of writing a complicated expression to cover all possible formats.

Try it and see for yourself. See if you can re-write the query with one regular expression per extraction, and then compare the performance of the two queries with SQL Performance Monitor.

By Strictly-Software

© 2019 Stictly-Software