Friday, 31 August 2012

New Version of the Twitter Hash Tag Hunter Application

Twitter HashTag Hunter Version 1.0.3 Released

Version 1.0.3 of the popular Twitter HashTag hunter Application has just been released.

This version has had an update to keep on top of the ever changing Twitter API and fixes a bug that was returning a follower count of 0 for some people.

You can find more out about the application over on my main site: Twitter HashTag Hunter Application.

Any existing users who want a new copy please email me with your reference/invoice number or details that prove that you bought the application previously and I will email you a new version.

Wednesday, 29 August 2012

Shrinking an MS SQL Database MDF file after a TRUNCATE or big DELETE

Shrinking an MS SQL Database MDF file after a TRUNCATE or big DELETE

I am not a DBA and we don't have a dedicated one at our company therefore when I had a database file with a huge table containing 80 million rows of data taking up 35 GB I needed to remove it.

I re-planned my usage of the table and decided that it would only keep a days worth of data instead of every day (going back 3 years) and re-jigged some code about.

I then set up an nightly job with a TRUNCATE TABLE statement to remove the data quickly at midnight each day.

However just by doing this alone does not reduce the size of the database file (MDF) and you will have reserved and unused space that you may want to reclaim or get rid of.

The current sizes of the database file before running any clean ups was 35 GB and as I was using simple mode for the transaction log the size of that file was negligible. Therefore after much research I had to go through the following routine to reduce the size of the database file.

A lot of people will warn you not to Shrink a database file as it will only grow again and cause disk fragmentation which is correct however if you have just removed a large table like I have then this is one case where a DBCC SHRINKFILE command is useful.

Therefore this is the approach I followed this morning.

I first ran this SQL to find out current size of tables and indexes plus any reserved and unused space within each database table. I also list out the tables in order of reserved size to see which has the most to be reclaimed and whether or not it matched the table I thought was causing problems - which it did.

CREATE TABLE #t (name SYSNAME, rows CHAR(11), reserved VARCHAR(18), 
data VARCHAR(18), index_size VARCHAR(18), unused VARCHAR(18))

DECLARE @Sizes TABLE(name SYSNAME, rows int, reserved int, Data int, index_size int, unused int)

-- ensure correct values are being returned by using  @updateusage see http://msdn.microsoft.com/en-us/library/ms188776.aspx
EXEC sp_msforeachtable 'INSERT INTO #t EXEC sp_spaceused ''?'', @updateusage = N''TRUE'';'

INSERT INTO @Sizes
SELECT Name, Rows, CAST(SUBSTRING(Data, 1, LEN(Data)-3) as INT), CAST(SUBSTRING(Reserved, 1, LEN(Reserved)-3) as INT),
  CAST(SUBSTRING(Index_Size, 1, LEN(Index_Size)-3) as INT), CAST(SUBSTRING(Unused, 1, LEN(Unused)-3) as INT)
FROM #t

DROP TABLE #t

SELECT CAST(SUM(Data) as varchar)+' KB' as 'Data Size',
  CAST(SUM(Reserved) as varchar)+' KB' as 'Reserved Size',
  CAST(SUM(Index_Size) as varchar)+' KB' as 'Index Size',
  CAST(SUM(Unused) as varchar)+' KB' as 'Unused Size'
FROM @Sizes


SELECT *
FROM @Sizes
ORDER BY Reserved DESC

Then I ran this script that I found to find out if there is space that can be removed.

SELECT name ,size/128.0 - CAST(FILEPROPERTY(name, 'SpaceUsed') AS int)/128.0 AS AvailableSpaceInMB
FROM sys.database_files;

This showed me that there was indeed a good 30 GB or so that could be removed.

Once I had worked out the new size of the data file in megabytes that I wanted to reduce the MDF file of my database to I ceased all activity on the database to help speed up the process and prevent any errors.

I then ran the following command: DBCC SHRINKFILE (details here)

To find the logical file name that you need to shrink either an MDF (database) or LDF (log file) run this command.

sp_helpdb [database name]

Once everything was ready I ran the following command to shrink a database to 6 GB (you have to pass the value in as MB) which left some room for the database to grow into.

DBCC SHRINKFILE([database name], 6144);

After running the command I did not see an immediate difference when I checked the properties of the database in MS Management Studio.

Therefore I ran a job to ReIndex my tables and rebuild rebuild statistics.

If you don't already have a nightly/weekly MS Agent job set up to do this for you then the following script of mine might be useful for you: ReIndex or ReOrganize your MS SQL indexes.

This script will allow you to set a percentage e.g 30% and if the index is fragmented over that percentage then a full REBUILD of the Index is carried out otherwise a REORGANIZE is performed.

After the rebuilding is done a call to EXEC sp_updatestats is made to update the systems statistics.

If you are going to use this SQL code check that the parameters within the stored procedure are to your liking and turn the DEBUG flag on if you require it.

The script will show you the fragmentation level before and after the script has run so that you can see how much difference it has made.

I found that after doing this when I went to database > properties > files I could now see the new sizes.

Logical NameFile TypeFilegroupInitial SizeAutogrowthPathFile Name
MyDBRows DataPRIMARY6144By 1 MB, unrestricted growthC:\Program Files 9x86)\Microsoft SQL Server\MSSQL.1\MSSQL\DATAMyDB.mdf
MyDB_logLogNot Applicable7377By 10 Percent, restricted growthC:\Program Files 9x86)\Microsoft SQL Server\MSSQL.1\MSSQL\Data_logsMyDB_log.ldf

This clearly shows that my database MDF file has been reduced from 35 GB to 6 GB.

If anyone else has any comments or notes on best practise I would be glad to hear them.

Sunday, 19 August 2012

PayPal and Instant Payment Notification (IPN) Messages

Blocking bad bots and issues with PayPal's IPN service

If you are someone like me who likes to ban as much bad BOT traffic as possible then you might have added a rule in your .htaccess or httpd.ini file that blocks blank user-agents. Something like this.


# Block blank or very short user-agents. If they cannot be bothered to tell me who they are or provide gibberish then they are not welcome!                                        
RewriteCond %{HTTP_USER_AGENT} ^(?:-?|[a-z1-9\-\_]{1,10})$ [NC]
RewriteRule .* - [F,L]


This is because many bad bots wont supply a user-agent because of the following reasons:
  1. They think websites will check for user-agent strings and by not supplying one they will slip under the radar.
  2. The people crawling are using simple methods e.g they have just grabbed bits of code off the web that don't actually set a user-agent and used them as is.
  3. They are using simple HTTP functions like file_get_contents('http://www.mysite.com'); to get the HTTP response and not passing in any context data such as user-agent, timeouts, headers etc.
  4. They are making requests from websites that use CURL or WGet from places like WebMin to test Cron jobs or run crawl scripts and not supplying the user-agent parameters which are available.
Obviously it's always best to supply a user-agent as it identifies (or masks) your real code in any log file and prevents you getting blocked by people who ban blank user-agent strings.

However there come times when even big companies like PayPal send information without user-agent details.

One instance of this is the IPN notification messages they send (if set up) on sites that are using PayPal as their payment gateway.

When a customer makes a payment or cancels a subscription then PayPal will send a POST request to a special page on the website that then checks the validity of the request to make sure it comes from PayPal. If so it sends back the data along with a special key so that PayPal can confirm they made the request and the website owner is sure it's not a spoofer trying to get free items.

Once this handshake is done the relevant IPN data such as payment information can be obtained from the PayPal IPN Response.

However today I received an email in my inbox that said.

Dear Rob Reid,

Please check your server that handles PayPal Instant Payment Notifications (IPN). Instant Payment Notifications sent to the following URL(s) are failing:

http://www.mysite.com/?myIPN_paypal_handler

If you do not recognize this URL, you may be using a service provider that is using IPN on your behalf. Please contact your service provider with the above information. If this problem continues, IPNs may be disabled for your account. 

Thank you for your prompt attention to this issue.


Yours sincerely, 

PayPal 


After checking my PayPal account and the IPN history page I could see that a message was stuck in the queue an being re-sent every so often:


1W03166E61323432D
18/08/2012 09:01 BST
Original
---
http://www.mysite.com/?myIPN_paypal_handler
403
Resending
14
0U508347GT8609TY0
Transaction made

Therefore I checked my website logs for any instance of this request and I found that they were all returning 403 / Forbidden status codes.



66.211.170.66 - - [18/Aug/2012:08:54:47 +0000] "POST /?myIPN_paypal_handler HTTP/1.0" 403 417 "-" "-" 0/55005
66.211.170.66 - - [18/Aug/2012:09:27:11 +0000] "POST /?myIPN_paypal_handler HTTP/1.0" 403 417 "-" "-" 0/55010

As you can see - no user-agent!

Therefore I changed my .htaccess file to allow blank user-agents and re-sent the IPN message from the PayPal console and low and behold it was allowed.

Obviously if you are not using PayPal and their IPN system or not using any Payment service then you might consider to keep blocking blank agents due to the amount of bandwidth you will save (lots from my own stats).

Or you should check that payment systems own requests to your site to ensure you are not blocking them from doing anything important like sending payment information to your website.

Just be-warned in case you get a similar email from PayPal.