Tuesday, 29 September 2015

IE Bug with non Protocol Specific URLS

IE Bug with non Protocol Specific URLS

By Strictly-Software

I have recently come across a problem that seems to only affect IE and non protocol specific URLS.

These are becoming more and more common as they prevent warnings about insecure content and ensure that when you click on a link you go to the same protocol as the page you are on.

This can obviously be an issue if the site you are on is on an SSL but the site you want to go to doesn't have an HTTPS domain or vice-versa. However most big sites have both domains and will handle the transfer by redirecting the user and the posted data from one domain to another.

An example would be a PayPal button that's form posts to

action="//www.paypal.com/"

In the old days if you had a page on an SSL e.g https://www.mysite.com and had a link to an image or script from a non secure domain e.g http://www.somethirdpartysite.com/images/myimg.jpg you would get pop ups or warning messages in the browser about "non secure content" and would have to confirm that you wanted to load it.

Nowadays whilst you don't see these popups in modern browsers if you check the console (F12 in most browsers) you will still see JavaScript or Network errors if you are trying to load content cross domain and cross protocol.

S2 Membership Plugin

I came across this problem when a customer who was trying to join one of my WordPress subscription sites (that uses the great free S2 Membership Plugin), complained to say that when he clicked a payment button that I have on many pages, was being taken to PayPal's homepage rather than the standard payment page that details the type of purchase, price and options for payment.

It was working for him in Chrome and FireFox but not IE.

I tested this on IE 11 Win7 and Win8 myself and found that this was indeed true.

After hunting through the network steps in the developer toolbar section (F12) and comparing it to Chrome I found that the problem seemed to be IE doing a 301 redirect from PayPals HTTP domain to their HTTPS one. 

After analysing the response and request headers I suspect it is something to do with the non UTF-8 response that PayPal was returning to IE for some reason, probably because Internet Explorer wasn't requesting it as such for some reason.

Debugging The Problem


For the techies. This is a breakdown of the problem with network steps from both Chrome and IE and relevant headers etc.

First the Paypal button code which is encrypted by S2Member on the page. You get given the button content as standard square bracket short codes which get converted into HTML on output. Looking at the source on the page in both browsers on one button I could see the following.

1. Even though the button outputs the form action as https://www.paypal.com it seems that one of my plugins OR WordPress itself, although I suspect a caching plugin obviously, is changing my links (I haven't been able to narrow it down - or WordPress) is removing any protocols conflicts by using non protocol specific URLS.

So as my site doesn't have an SSL any HREF, SRC or ACTION that points to an HTTPS URL was being replaced with // e.g https://www.paypal.com on my page http://www.mysite.com/join was becoming //www.paypal.com in the source and generated source.

2. Examining the HTML of one of the buttons you can see this in any browser. I have cut short the encrypted button code as it's pointless outputting it all.


<form action="//www.paypal.com/cgi-bin/webscr" method="post">
<input type="hidden" name="cmd" value="_s-xclick">
<input name="encrypted" type="hidden" value="-----BEGIN PKCS7-----MIILQQYJKoZIhvcNAQcEoIILMjCCCy4CAQExgg..."


3. Outputting a test HTML page on my local computer and running it in IE 11 WORKED. This is was probably because I explicitly set the URL to https://www.paypal.com so no redirects were needed.

4. Therefore logically the problem was due to the lack of an HTTPS in the URL.

5. Comparing the network jumps.

1. Chrome

Name - Method - Status    - Type                     - Initiator
webscr - POST - 307      - x-www-form-urlencoded    - Other
webscr - POST - 200 - document             - https://www.paypal.com/cgi-bin/webscr

2. IE

URL         - Protocol  - Method - Result - Type       - Initiator
/cgi-bin/webscr         - HTTP      - POST   - 301       -           - click
/cgi-bin/webscr         - HTTPS    - POST   - 302       - text/html  - click
https://www.paypal.com/home - HTTPS   - POST   - 200       - text/html  - click

Although the titles are slightly different you can see they just are different words for the same thing e.g Status in Chrome or Result in IE both relate to the HTTP Status code the response returned.

As you can see Chrome also had to do a 307 (HTTP 1.1 successor to the 302 temporary redirect) from HTTP to HTTPS however it ended up on the correct page. Whereas in IE when I first clicked on the button it took me to the payment page in HTTP but then did a 301 (permanent) redirect to it in HTTPS and then a 302 (temporary) redirect to their home page.

If you want to know more about these 3 redirect status codes this is a good page to read.

The question was why couldn't IE take me to the correct payment page?

Well when I looked at the actual POST data that was being passed along to PayPal from IE on the first network hop I could see the following problem.

cmd=_s-xclick&encrypted=-----BEGIN----MIILQQYJKoZIhvcNAQcEoIILMjCCCy4CAQExgg...

Notice the Chinese character after the BEGIN where it should say PKCS7?

In Chrome however this data was exactly the same as the form e.g

cmd:_s-xclick
encrypted:-----BEGIN PKCS7-----MIILQQYJKoZIhvcNAQcEoIILMjCCCy4CAQExgg...

Therefore it looked like for some reason the posted data was being misinterpreted by IE whereas in Chrome it was not. Therefore I needed to check what character sets and response was being sent and returned.

Examining Request and Response Headers

When looking at the HTTP Request headers on the first POST to PayPal in IE I could see that the Accept-Language header was only asking for en-GB e.g a basic ASCII character set. Also there was quite a lack of request headers compared to IE. I have just copied the relevant ones that can be compared between browsers.

IE Request Headers

Key         Value
Content-Type: application/x-www-form-urlencoded
Accept-Language: en-GB
Accept-Encoding: gzip, deflate
Referer: http://www.mysite.com/join-now/
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
Request: POST /cgi-bin/webscr HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Host:         www.paypal.com

Chrome Request Headers

Key                 Value
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip, deflate
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Cache-Control: max-age=0
Connection: keep-alive
Content-Length:4080
Content-Type: application/x-www-form-urlencoded


And the responses for the Content-Type header which I think is key.

IE

Content-Type: text/html

Chrome

Content-Type: text/html; charset=UTF-8


So whilst Chrome is saying it will accept more language sets and gets back a charset of UTF-8 IE is only saying it will accept only en-GB and gets back just text/html.

I even tried seeing if I could add UTF-8 in as a language to accept in IE but there was no option to, so I tried adding Chinese which obviously use extended character sets and was the problematic character.
 However this made no difference even though the Accept-Language header was now:

Accept-Language: en-GB,zh-Hans;q=0.9,zh-CN;q=0.8,zh-SG;q=0.7,zh-Hant;q=0.6,zh-HK;q=0.4,zh-MO;q=0.3,zh-TW;q=0.2,zh;q=0.1

Conclusion


Therefore I came to the conclusion that I could not force IE to change it's behaviour and I doubt any phone calls to IE HQ or even PayPal would solve the issue. Therefore to allow IE users to be able to still pay on my site I needed a workaround.

1. I added to my technical test page which I make all users check before paying a test for IE and a warning about possible problems. This went alongside tests to ensure JavaScript and Cookies were enabled as both are needed for any modern JavaScript site.

2. I added in some JavaScript code in my footer that run on DOM load that looped through all FORM elements and checked the ACTION attribute. Even though when examining the results in the console they showed http://www.paypal.com rather than what I saw in the source //www.paypal.com I added some code in to ensure they always said HTTPS.

The function if you are interested is below and it seems to have fixed the problem in IE. If I view the generated source now I can see all form actions have HTTPS protocols.



// on DOM load loop through all FORM elements on the page
jQuery(document).ready(function () {	
	// get all form elements
	var o,,e=document.getElementsByTagName("FORM");
	for(var i=0,l=e.length;i<l;i++)
	{
		// get the action attribute
		o = e[i].action;

		// if current action is blank then skip
		if(o && o!="")
		{
			// if the start of the action is http (as non protocol specifc domains show up as http)
			// then replace the http with https
			if( /^http:\/\/www.paypal.com/.test(o) )
			{
				e[i].action = o.replace("http:","https:");
			}		
		
		}
	}	
});


So whilst this is a just a workaround for the IE bug it does solve the issue until Internet Explorer sorts itself out. Why they have this problem I have no idea.

I am outputting all my content as a UTF-8 charset and Chrome is obviously handling it correctly (along with Firefox and Safari).

So I can only presume it's an IE bug which isn't helped by an unknown (as yet) plugin (or WordPress) changing cross protocol URLs to the now standard //www.mysite.com format.

Therefore if you come across similar problems with redirects taking you to the wrong place, check your headers, compare browsers and if you spot something strange going on, try a JavaScript workaround to modify the DOM on page load.


© 2015 Strictly-Software

Friday, 18 September 2015

What is the point of client side security

Is hacking the DOM really hacking?

By Strictly-Software

The nature of the modern web browser is that it's a client side tool.

Web pages that are stored on web-servers when viewed in Chrome or FireFox are downloaded file by file (CSS, JavaScript, HTML, Images etc), and stored temporarily on your computer whilst your browser application puts them together so you can view the webpage.

This is where your "browser cache" comes from. It is good to have commonly downloaded files such as the jQuery script or common images from frequently visited pages in your cache but when this folder gets too big it can become slow to traverse and load from. This is why a regular clean out is recommended by a lot of computer performance tools.

So because of this putting any kind of security on the client side is pointless as anyone with a small working knowledge of Internet technology can bypass it. I don't want to link to a certain site in particular but it appeared as a google advert on my site the other day claiming to protect your whole website from theft, including your HTML source code.

However if you have a spare 30 minutes on your hands, have Firebug installed (or any modern browser that allows you to visit and edit the DOM) and did a search for "code to protect HTML" you would be able to bypass the majority of the sites wonderful security claims with ease.

Examples of such attempts to use client side code to protect code or content include:

1. Trying to protect the HTML source code from being viewed or stolen. 

This will include the original right mouse click event blocker.

This was used in the old days in the vain hope that people didn't realise that they could just go to Tools > View Source instead of using the context menu which is opened with a right click on your mouse.

The other option was just to save the whole web page from the File menu. 

However you can now just view the whole generated source with most developer tools e.g Firebug - or hitting F12 in Chrome.

Some sites will also generate their whole HTML source code with Javascript code in the first place. 
Not only is this really really bad for SEO but it is easily bypassed.

A lot of these tools pack, encode and obfuscate it on the way. The code is then run through a function to evaluate it and write it to the DOM

It's such a shame that this can all be viewed without much effort once the page loads in the DOM. Just open your browsers Developer Toolbar and view the Generated Source and hey presto the outputted HTML is there.

Plus there are many tools that let you run your scripts on any page e.g someone at work the other day didn't like the way news sites like the BBC always showed large monetary numbers as £10BN and added a regular expression into one of these tools to automatically change all occurrences to £10,000,000,000 as he thought the number looked bigger and more correct.  Stupid example I know but it shows that with tools like Fiddler etc that you can control the browser output.

2. Using special classes to prevent users from selecting content

This is commonly used on music lyric sites to prevent people copying and pasting the lyrics straight off the page by selecting the content and using the copy button.

Shame that if you can modify the DOM on the fly you can just find the class in question with the inspect tool, blank it out and negate it's affect.

3. Multimedia sites that show content from TV shows that will remain unnamed but only allow users from the USA to view them. 

Using a proxy server sometimes works but for those flash loaded videos that don't play through a proxy you can use YSlow to find the base URI that the movie is loaded from and just load that up directly.

To be honest I think these companies have got wise to the fact that people will try this as they now insert location specific adverts into the movies which they never used to do. However it's still better than moving to the states!

4. Sites that pack and obfuscate their Javascript in the hope of preventing users from stealing their code. 

Obviously minification is good practise for reducing file size but if you want to unpack some JavaScript then you have a couple of options and there maybe some valid reasons other than just wanting to see the code being run e.g preventing XSS attacks.

Option 1 is to use my script unpacker form which lets you paste the packed code into a textarea, hit a button and then view the unpacked version in another textarea for you to copy out and use. It will also decode any encoded characters as well as well as formatting the code and handling code that has been packed multiple times.

If you don't want to use my wonderful form and I have no idea why you wouldn't then Firefox comes to the rescue again. Copy the packed code, open the Javascript error console and paste the code into the input box at the top with the following added to the start of it:

//add to the beginning eval=alert;
eval=alert;eval(function(p,a,c,k,e,r){e=String;if(!''.replace(/^/,String)){while(c--)r[c]=k[c]||c;k=[function(e){return r[e]}];e=function(){return'\\w+'};c=1};while(c--)if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);return p}('3(0,1){4(0===1){2("5 6")}7{2("8 9")}',10,10,'myvar1|myvar2|alert|function|if|well|done|else|not|bad'.split('|'),0,{}))

// unpacked returns
function(myvar1,myvar2){if(myvar1===myvar2){alert("well done")}else{alert("not bad")}



Then hit evaluate and the unpacked code will open in an alert box which you can then copy from.

What the code is doing is changing the meaning of the function eval to alert so that when the packed code runs within its eval statement instead of executing the evaluated code it will show it in the alert message box.

There are many more techniques which I won't go in to but the question then is why do people do it?

Well the main reason is that people spend a lot of time creating websites and they don't want some clever script kiddy or professional site ripper to come along steal their content and use it without permission.

People will also include whole sites nowadays within frames on their own sites or just rip the whole thing CSS, images, scripts and everything else with a click of a button. There are too many available tools to count and a lot of phishing sites will copy a banks layout but then change the functionality so that it records your account login details.

I have personally seen 2 sites now that I have either worked on or know the person who did the work appear up on the net under a different URL with the same design, images, JS code, all the same apart from the wording was in Chinese .

The problem is that every modern browser now has a developer tool set like Firebug, Chrome or Internet Explorers developer toolbar. For older browsers there are Operas dragonfly and even Firebug-lite which replicates Firebug functionality for those of you wanting to use it on older browsers like IE 6.

Therefore with all these built in tools to override client side security techniques it seems pretty pointless trying to put any sort of security into your site on the client side.

Even if you didn't want to be malicious and steal or inject anything you can still modify the DOM, run your own Javascript, change the CSS and remove x y and z.

All security measures related to user input should be handled on the server to prevent SQL injection and XSS hacks but that's not to say that duplicating validation checks on the client isn't a good idea.

For one thing it saves time if you can inform a user that they have inputted something incorrectly before the page is submitted.

No one likes to fill in a long form submit it and wait whilst the slow network connection and bogged down server takes too long to respond only to show another page that says one of the following:
  • That user name is already in use please choose another one.
  • Your email confirmation does not match.
  • Your password is too short.
  • You did not complete blah or blah.
Things like this should be done client side if possible, using Ajax for checks that need database look ups such as user name availability tests. Using JavaScript to test whether the user has JavaScript enabled is a good technique for deciding whether to rely purely on server side validation or to load in functions that allow for client side validation if possible.

However client side code that is purely there to prevent content from being accessed without consent seems pointless in the age of any modern browser.

Obviously there is a large percentage of web users out there that wouldn't know the first thing to do when it comes to bypassing client side security code and the blocking of the right click context menu would seem like black magic to them.

Unfortunately for the people who are still wanting to protect their client side code the people that do want to steal the content will have the skills to bypass all your client side cleverness.

It may impress your boss and seem worth the $50 for about 10 minutes until someone shows you how you can add your own Javascript to a page to override any functions already there for blocking events and checking for iframe positioning.

My only question would be is it really hacking to modify the DOM to access or bypass certain features meant to keep the content on that page?

I don't know what other people think about this but I would say no its not.

The HTML, images, JavaScript and CSS are ALL on my computer at the point of me viewing them on whatever browser I am using. Therefore unless I am trying to change or inject anything into the web or database server to affect future site visitors or trying to bypass challenge responses then I am not really hacking the site just modifying the DOM.

I'd be interested to know what others think about that question?

By Strictly-Software

© 2015 Strictly-Software

Sunday, 23 August 2015

Web Developing - My 10 Golden Rules For Developers

Ten Golden Rules for any Web Developer

When you have developed code for a living for the best part of 18 years you soon come up with a list of rules that help keep you sane and prepare you for the next disastrous project coming your way.

As Bismark said - "The wise man learns from the mistakes of others."

So, don't learn from your own mistakes, try and get it right first time.

Here are my own little rules to help you from learning from your own mistakes.


1. Always add comments.

The more comments the better. There is nothing worse than coming back to a bit of complicated code you or someone else wrote years ago and have no idea why they wrote it that way.

Comments can always be stripped before roll out but on a development system they are a life saver especially if you have had to do something out of the ordinary because of some uncommon situation.


2. Format your code.

Proper indentation and casing is most definitely above cleanliness when compared to Godliness.

Not only does it make the code easier to read but it looks better and prevents developers with mild OCD from stressing out and having to spend too much time re-formatting before they can even help you debug that mess you call work.


3. Don't re-develop the wheel.

Having said that don't put up with someone else's flat tyres.

People use libraries because it saves them time but every library has an author and no-one is infallible. The benefits to using your own code is that you know how every line works and can fix bugs immediately. I have not yet come across a 3rd party library that didn't have at least one major flaw and there is a reason new libraries constantly appear in the first place.

Not only will you learn more by writing your own code but you won't have to rely on someone else to fix it all when it goes wrong.


4. Cherish the anally retentive customer.

Whilst anally retentive customers may seem like a pain in the behind in reality you should cherish the fact that they actually know what they want (see point 5).

A customer that knows exactly what he wants up front and is prepared to sign off a spec and keep to it is a rare occurrence and you should be prepared to put up with their constant emails and calls during development.

You should do this because you know that once you have delivered everything they want - even if it looks like a bag of shite and acts like a bag of shite. If it was a bag of shite that they really really wanted in the first place then it will be your bag of shite that makes them happy and brings home the bacon.

A happy customer is one that doesn't ring you constantly months after the live date asking for little changes and wondering if they just have X, Y and Z all for free because they didn't think ahead earlier.


5. Most customers know Jack

The majority of customers will know next to nothing about the system they want and expect you to know how their business is run.

Getting a properly defined spec out of some customers is harder than milking a male cow.

Customers will give you definites and then complain about the lack of flexibility.

They will expect gold plating but only want to pay for metallic.

They will expect you to see into the future and know their business requirements and have them programmed before they do.

You should be prepared for this by building in ultimate flexibility at the earliest opportunity so that when they eventually change their minds you don't have to re-develop their system from the ground up.


6. Ignore No and plan for Yes

When you ask your boss if the system you are working on needs to do X and he says categorically NO. Be prepared for him to change his mind a month or two down the line when the next sale depends on that feature.

Sales people don't care how long it took you to develop something or how hard it was to develop it. All they care about is getting their next commission.

Be prepared to spend months developing features that are used purely for selling a product rather than for their usefulness. Take that opportunity to develop said feature in a new language or use the opportunity to learn a new API and incorporate it into your code. Even if the feature isn't used at least you have learnt a new skill for your CV.


7. Automate, automate, automate.

There is nothing worse as a developer spending time doing all the monotonous tasks like building Class structures, input forms and CRUD stored procs when you could be working on a cool widget, stealthy scraper or funky functionality that actually engages your mind.

For this reason you should automate the CRUD so you can spend your time on the interesting work. There are many tools available to automate code outlines based on database structures and if you always code to a template you should be able to rattle off the basics quickly so that your time is better spent on the interesting aspects of a project.


8. A Metro with a Ferrari engine still looks like a Metro. A Ferrari with a Metro engine still pulls the birds.

Most end users, customers and bosses don't care about the cleverness of your code or how many lines you have had to write to make that balloon go pop. All they care about is how the site looks.

Front end's matter because that's the part the user sees, whereas back-ends are important because they make it all work. However good the database, business logic and code is under the covers if the front end looks like a four year old's first drawing then no-one will care.

In the same way as a sales person takes all the money from your hard work. Your high performing, code wise back end is just the engine that props up the front end that everyone looks at and goes "ooh isn't that nice". OR more importantly makes them decide that your site is rubbish purely because it looks bad.

Remember you could have the best code in the world but if it has a crap design people won't appreciate it.


9. Be prepared to be unappreciated and under paid.

Sales people and bosses will milk your talent to enrich themselves whilst giving you platitudes and praise but keeping the hard cash for themselves.

Making good money from web development is very hard unless you come up with the newest idea that Google might purchase down the road or work for big well paying company.

It's very easy to steal code from the web and with open source it's increasingly hard to make good money when people are throwing their code around for free in the hope that someone likes it enough to pay them for an upgrade or support.


10. Learn to code for fun.

If you don't enjoy coding then you won't be prepared to put the spare time in to make your own projects on the side. No-one ever got rich working for someone else but being able to give up your job to work on your code is impossible unless you have money to back your work. Being able to code for pleasure makes working on your own projects in your spare time a whole lot easier.


Those are my top 10 tips for helping you get through life as a web developer. If you have your own tips please respond using the comment section.

Tuesday, 4 August 2015

Turning off Windows 10 Privacy Features

Turning off Windows 10 Privacy Features

By Strictly-Software 

If you are like me and on Windows 8.1 then you will probably be constantly bombarded by messages from Microsoft about how you can obtain a free upgrade to Windows 10.

However before you upgrade you should beware of all the security and privacy concerns people have about this operation system.

With a little research it is well known that Microsoft have a close relationship with the US security services such as the CIA / NSA.

They have even bought up businesses such as Skype to prevent having to hack in back doors when they can instead have a wide open front door for all your personnal traffic, calls and texts.

As the video states.

This is a quick guide to fixing privacy concerns in Microsoft's Windows 10 operating system. 

The default settings and applications with Windows 10 have numerous security flaws, privacy problems, keystroke monitors installed and turned on by default.

This guide will help you fix the problems in a simple way.

Be sure to read this article as it will explain all the security holes in detail if you want more information on what Microsoft are monitoring and why.

However this quick overview video should help you on the way.





By Strictly-Software

© 2015 Strictly-Software

Thursday, 9 July 2015

DEBUG Stored Procedures Using PRINT or RAISERROR in TSQL

DEBUG Stored Procedures Using PRINT or RAISERROR in TSQL

By Strictly Software 

Usually when I am having to write stored procedures with many sections within a transaction I set a @DEBUG BIT variable at the top of the procedure to help with debug.

Then I add checks to see if it is enabled and at various points in the procedure, usually at the start and end plus before and after each block of code I would output some debug telling me what is going on.

This really helps me when I have problems with the procedure and need to know where any bug is happening.

An example of a basic stored procedure with transactions so I can rollback the code if something goes wrong, plus the use of a TRY CATCH so that I can log error details to a table is shown below.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE PROCEDURE [dbo].[usp_net_clean_up]
	@SitePK int
AS
BEGIN
	
	SET NOCOUNT ON;

	DECLARE @ROWS INT,			
		@DEBUG BIT

	SELECT @DEBUG = 1

	IF @DEBUG = 1
	  PRINT 'IN usp_net_clean_up - SitePK: ' + CAST(@SitePK as varchar)

	BEGIN TRAN

	BEGIN TRY

        UPDATE	NIGHTLY_JOBS
	SET	Pending = 0
	WHERE	SitePK = @SitePK

	-- capture data errors
	SELECT @ROWS = @@ROWCOUNT

	IF @ROWS = 0 OR @ERROR != 0
	  BEGIN
		IF @DEBUG = 1
		  PRINT 'No rows updated'

		GOTO HANDLE_ERROR
	  END

	UPDATE	LOCK_FILES
	SET	Locked = 0
	WHERE	Locked = 1
		AND SitePK = @SitePK
		AND DATEDIFF(Day,Stamp,GETDATE())=0

	SELECT @ROWS = @@ROWCOUNT

	IF @ROWS = 0 OR @ERROR != 0
	  BEGIN
		IF @DEBUG = 1
		  PRINT 'No rows updated'

		GOTO HANDLE_ERROR
	  END

	END TRY
	BEGIN CATCH
		
		IF @DEBUG = 1
		  BEGIN
			PRINT 'IN CATCH ERROR'
			PRINT ERROR_MESSAGE()
			PRINT ERROR_LINE()
		  END

		 -- all ERROR functions will be available inside this proc
		  EXEC dbo.usp_sql_log_error @SitePK

		  -- rollback after anything you want to do such as logging the error 
		  -- to a table as that will get rolled back as well if you don't!
		  IF @@TRANCOUNT > 0
		    ROLLBACK TRAN

		  GOTO HANDLE_ERROR
	END CATCH

	IF @DEBUG = 1
	  PRINT 'End of proc - no errors'

	IF @@TRANCOUNT > 0
	  COMMIT TRAN

	GOTO EXIT_PROC


	HANDLE_ERROR:
	  IF @DEBUG = 1
	    PRINT 'IN HANDLE ERROR'

	  RETURN 1 -- I use 1 for success despite SQL recommendations!

	EXIT_PROC:
	  IF @DEBUG = 1
	    PRINT 'IN EXIT_PROC'

	  RETURN 0 -- failure
END

However another way to output debug messages is with the RAISERROR function and the use of placeholders for values, a bit like the sprint function in PHP.

To be honest I only used the function previously to raise custom errors but you easily use it for debugging as well.

This is an example of a super fast way to insert 100,000 rows into a table (using the TEMPDB), and using RAISERROR to output debug messages related to the time it takes plus some examples of the placeholder types which are listed at the bottom of the page.

SET NOCOUNT ON
SET DATEFORMAT YMD

DECLARE @StartStamp VARCHAR(20),
	@StopStamp VARCHAR(20),
	@ROWS INT
SELECT @StartStamp = CONVERT(varchar, GETDATE(),13)

RAISERROR('Start insert at %s',0,1,@StartStamp) WITH NOWAIT;

-- ensure our table doesn't already exist
IF OBJECT_ID('tempdb.dbo.random_data','U') IS NOT NULL
  DROP TABLE tempdb.dbo.random_data;

RAISERROR('Start insert of data',0,1)
-- super fast insert of 100,000 rows into a table
SELECT TOP (100000)
        RowNo   = ISNULL(CAST( ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS INT),0),        
        RandomID = NEWID() 
INTO	tempdb.dbo.random_data
FROM	master.sys.all_columns ac1
CROSS JOIN master.sys.all_columns ac2
CROSS JOIN master.sys.all_columns ac3

SELECT @ROWS = @@ROWCOUNT, @StopStamp = CONVERT(varchar, GETDATE(),13)

RAISERROR('There are %d rows in the table. Error Code was %u. Insert completed at %s',0,1,@ROWS,@@ERROR,@StopStamp) WITH NOWAIT;
-- output results
SELECT * FROM tempdb.dbo.random_data
-- drop our table
DROP TABLE tempdb.dbo.random_data;
-- ensure it is dropped
IF OBJECT_ID('tempdb.dbo.random_data','U') IS NULL
  RAISERROR('Dropped table %s',0,1,'tempdb.dbo.random_data')
ELSE
  RAISERROR('Could not drop table %s',0,1,'tempdb.dbo.random_data')

When I run this code the debug in the messages tab of SQL Query Analyser are:

09 Jul 2015 11:37:18
Start insert at 09 Jul 2015 11:37:18
Start insert of data
There are 100000 rows in the table. Error Code was 0. Insert completed at 09 Jul 2015 11:37:18
Dropped table tempdb.dbo.random_data

As you can see you can enter multiple parameters into the RAISERROR function.

The only thing you must remember is that you cannot use functions such as GETDATE(), CONVERT() or CAST() as substitution values. They must all be literals which is why I am converting the time stamps into strings first.

I hope you also notice how fast the insert of 100,000 rows all with unique values in the 2nd column is.

It takes less than one second!

This method is much faster than any WHILE loop or CURSOR method to insert records and you should add it to your box of tricks for future use.

Some people use number tables for insert jobs like this but there is no need when you have system tables with thousands of rows within them.

The trick is to use two CROSS JOINS to to the system tables master.sys.all_columns to create the necessary rows for the insert.

CROSS JOINS are hardly used by many people but they are very useful when you need to do insert jobs like this quickly.

Note how each row is ordered sequentially from 1 to 100,000 and the 2nd column has a unique GUID inside it. This is created from the NEWID() function.

So these are just two methods for adding debug to your TSQL code and it is up to you which method you find the most useful for the job at hand.

RAISERROR function parameter substitution values

%d or %i = signed integer

%o = unsigned octal

%p = pointer

%s = string

%u = unsigned integer

%x or %X = unsigned hexadecimal

By Strictly Software 

© 2015 Strictly Software

Sunday, 14 June 2015

The Wordpress Survival Guide - Part 2 - Performance

Surviving WordPress - Performance and Site Optimization


UPDATED - 14th Jun 2015

I have updated this to include a way to handle MySQL errors, a BASH script to tune Apache and an improved function to check your servers load and handle Windows errors. 

Plus code to disable the new WordPress HeartBeat functionality which can be a CPU / Bandwidth killer and a way to add CRON jobs to automate plugin functions without console access.

This is the second part of my guide to surviving WordPress and as promised it looks at performance tweaks and tips which I have gathered on my way.

It has been quite a while since the first instalment and the main reason for this was that I was suffering my own performance killer which I wanted to solve first before writing this article. Luckily this has now been solved with the help of Robert from Tiger Tech blog who helped me get to the bottom of the issue so here it is.

My own personal journey into WordPress performance tuning started off when I started to experience out of PHP memory errors when manually rebuilding my Google sitemap.

I started to play around with different plugins and then delve into the code which is when I started to realise the danger that WordPress plugins can carry out to a site when the user doesn't realise what's going on behind the scenes.

You can check out a detailed examination here but in my case it was using a Google Sitemap plugin that was set to rebuild when a new post was saved. Combining that with WP-O-Matic which imports articles at scheduled intervals and a TwitterBot such as my own which can send Tweets to multiple accounts whenever new content is added all added up to a performance killer!

If you have a similar setup it's worth running TOP, MyTOP and checking your access logs to see how it affects your own system but what was happening on my own setup was:

  • WP-O-Matic starts to import a feeds worth of articles (max of 10) for each article that is saved.
  • Numerous procedures hooked into the SavePost or PublishPost action run. In my case it was:
  1. My Strictly AutoTags plugin runs which analyses the article and adds relevant tags, depending on the admin settings, the number of tags and the length of the article this could be quick or slow.
  2. The Google Sitemap plugin then ran which runs a lot of SQL queries and creates a new file as well as pinging multiple SERPs with HTTP requests.
  3. My Strictly Tweetbot Plugin also runs which posts a tweet to multiple accounts. This caused a Twitter Rush as 50+ BOTS all hammered my site at the same time due to the new link appearing on Twitter. 
  4. Any other plugin using the Save hooks runs such as caching tools which create static files.
  • As soon as the Tweets arrive on Twitter a multitude of Bots, 50 on my last test, will visit the site to index the link that has just been posted OR try and scrape, hack or insert spam comments into the post.
  • If the link was posted to multiple accounts you will find that the same bots will visit for each account you posted to. Some bots like Yahoo seem to be particularly bad and visit the article multiple times anyway. So if I posted to 5 twitter accounts that's 250 visits in the space of a few seconds from BOTS scanning for new tweet links to visit!
  • All these visits create new Apache processes and depending on the amount of memory that each Apache process uses you could find that your server starts swapping memory to disk to handle the increase and in my case my server load would quickly jump from 0.15 to 50+.

The more articles you import the more iterations of this chain of performance killing events occurs. I found that these events would sometimes pass off without any noticeable problems but other times the server load would get so high that I would have to reboot my machine.

The highest value I recorded was 174 on a 1GB RAM Linux server!

In fact on some days I would have to reboot 3-5 times which is not good at all.

Getting to the bottom of the problem

A common solution to any performance related problem is to throw more resources at it. Many message boards recommended increasing the maximum memory limit to get round the Out of Memory errors the Google Sitemap was throwing up but that just masks the issue and doesn't actually solve it.

As a by product of my system tuning I ended up creating my own Google Sitemap Plugin to overcome limitations of the others.

Not only could it be easily set to rebuild at scheduled intervals instead of only when new posts were added which helps reduce unnecessary rebuilds, but it used far less memory and made a tiny number of database queries in comparison to the other market leaders.

I also created a System Reporting plugin so that I could be kept informed when my site was playing up and I found this invaluable in keeping my site running during this performance nightmare. If you are not on your site 24/7 and cannot afford professional monitoring services it is great to get an email telling you if you site is: down, taking ages to respond, has a very high server load or running too many SQL queries.

One of the first ideas to reduce the amount of times I was rebooting was to try and prevent any performance intensive tasks from running if the server load was already high.

I did this by adding in some checks to all my major plugins that made a call to the following function before running anything. If the load was above 1.0 I just exited immediately. You can read more about this method in this article: Testing Server Load.

function GetServerLoad(){

 $os = strtolower(PHP_OS); 
 
 // handle non windows machines
 if(substr(PHP_OS, 0, 3) !== 'WIN'){
  if(file_exists("/proc/loadavg")) {    
   $load = file_get_contents("/proc/loadavg"); 
   $load = explode(' ', $load);     
   return $load[0]; 
  }elseif(function_exists("shell_exec")) {     
   $load = @shell_exec("uptime");
   $load = explode(' ', $load);        
   return $load[count($load)-3]; 
  }else { 
   return false; 
  } 
 // handle windows servers
 }else{ 
  if(class_exists("COM")) {     
   $wmi  = new COM("WinMgmts:\\\\."); 
   if(is_object($wmi)){
    $cpus  = $wmi->InstancesOf("Win32_Processor"); 
    $cpuload = 0; 
    $i   = 0;   
    // Old PHP
    if(version_compare('4.50.0', PHP_VERSION) == 1) { 
     // PHP 4      
     while ($cpu = $cpus->Next()) { 
      $cpuload += $cpu->LoadPercentage; 
      $i++; 
     } 
    } else { 
     // PHP 5      
     foreach($cpus as $cpu) { 
      $cpuload += $cpu->LoadPercentage; 
      $i++; 
     } 
    } 
    $cpuload = round($cpuload / $i, 2); 
    return "$cpuload%"; 
   }
  } 
  return false;     
 } 
}


Apache Configuration

I finally got to the bottom of the problem I was suffering with the help of Tiger Tech after examining the output of ps auxwwH --sort rss during a period of high load. This listed all the currently running processes ordered by the amount of memory they were consuming.

At the time of running this my average load was 50 which meant there was a big queue of processes waiting to be run which included over 70 Apache processes each using between 8MB and 30MB and this alone was easily using up my 1GB of RAM.

This high number of Apache processes meant that my server was busily swapping from real memory to disk based virtual memory which was causing high I/O (clearly seen from the output of iostat) and slowing down the response times of each Apache process.

As each process got slower to respond new processes were spawned using up even more virtual memory adding to the problem. This spiral of death was only resolved if for some reason the traffic suddenly screeched to a halt (not likely during an article import that delivers hundreds of bots from Twitter on top of normal traffic) OR I killed Apache or the server.

The solution to this problem was to reduce the number of simultaneous Apache processes that could be run at one time by reducing the MaxClients setting in the Apache config file.

My existing setting of 256 was far too high for my 1GB RAM server. The way to calculate a more appropriate setting is to take the average size of an Apache process and then divide the total available memory by that number leaving room for other processes such as MySQL. In my case I was advised to set MaxClients to a value of 20 which seems small in comparison to the original value but makes more sense when you do the maths.

I have actually created a BASH script which you can run on your own server which will test the available space, average Apache process size, and then calculate the values for your MaxClients, MinSpareServers and MaxSpareServers which you can read here: BASH MaxClients Tuning Script.

Reducing my MaxClients setting to a much smaller value meant that the memory allocation for my system would never reach such unmanageable amounts again. If my server is swamped by traffic then instead of 256 Apache processes being spawned all trying to claim 20MB or more for themselves they will be queued up in an orderly fashion.

It might slow down some requests as they wait to be dealt with but that is far better than the whole server freezing up which was occurring regularly.

Two other settings I changed in the Apache conf file was the Timeout value down from 300 to 30 and HostnameLookups was turned off. You can read more about these settings at the Apache performance tuning site.

Another recent issue I have just had was eerily the opposite of the above. I would get periods of very low server load (0.00 - 0.02) and there would be no Apache or MySQL processes running. The websites couldn't be accessed and only a restart of Apache would fix it.

At first I was checking the Apache error logs and seeing lots of "MySQL Server has gone away" errors. I found that this was a common issue in WordPress and created a custom wp-db.php file which would re-connect to the server if a query ran and met that error. You can read more about that script here: Fixing the MySQL Server Has Gone Away Error.

However this just got rid of the error messages it didn't really fix any problems.

After a lot of reading and tuning I eventually found what "seems" to be a fix for this issue which may be caused by Apache processes hanging around for too long consuming up memory but not doing anything. I have edited the Apache conf file and changed KeepAliveTimeout value down from the current setting of 30 to 2 seconds.

I am debating on whether to turn it off altogether and then increase the MaxRequestsPerChild option. This website has some information about KeepAlive and whether you should turn it on or off.

Common WordPress Performance Tuning Tips

There are a number of common tips for performance tuning WordPress which you can read about in detail at other sites but I will quickly cover them here:

1. Install APC or another PHP caching system such as XCache or eAccelerator as these Opcode systems improve performance by saving and re-using compiled PHP which speeds up the execution of server side code.

2. Install a WordPress caching plugin such as WP Super Cache or W3 Total Cache. There is a debate over which one is best and whilst W3 Total Cache does offer more features such as Minification and Browser cache options the main issue that you want to resolve with WordPress is reducing the huge amount of database queries and code that is run on each page load. The aim is to do expensive tasks once and then re-use the results as many times as possible. Caching the results of database queries so that they don't have to be run every time the page loads is a great idea especially if the results hardly change and whilst W3 offers database query result caching as well as caching the output of the generated HTML Super Cache will only cache the generated output.

What is the difference? Well if you cached database query results then during the building of cached files the results of queries that are used to create category lists or tag clouds can be shared across builds rather than being recalculated for every page being cached that uses them. How much difference this makes when you take all MySQL's own internal query caching into consideration is debatable. However both plugins offer the major way to improve fast page loads which is disk based caching of the generated output incorporating GZIP compression.

If you do install W3 Total Cache and you have APC or another PHP Accelerator installed make sure that you enable the Disk Based Cache option for Page Caching and not Opcode which will be default selected if APC or XCache is installed.

3. If bandwidth is a problem then serving up minified and compressed HTML, CSS and JavaScript will help but you don't want to be repeatedly compressing files as they load. Some cache plugins will do this minification on the fly which hurts CPU whereas you really want it done once. There is nothing stopping you combining, compressing and minifying your files by hand. Then you will benefit from small files, fewer HTTP requests and less bandwidth whether or not you make use of a cache plugin.

4. Reduce 404 errors and ensure WordPress doesn't handle them as it will cane performance unnecessarily. Create a static 404 error page or ensure your cache system is setup to handle 404's. Also make sure that common files that cause 404's such as IPhone icons, Crossdomain.xml and favicons exist even if they are empty files.

5. If you're not planning on using a caching system then you should ensure that you tune your .htaccess file manually to ensure that browsers cache your files for specified periods of time rather than downloading them each time they visit your site. You also set your server to serve up compressed gzip files rather than letting a plugin do it for you.

You can do this by setting the future expire headers on your static content such as JS, CSS, images and so on like so:

<FilesMatch "(?i)^.*\.(ico|flv|ogg|swf|jpg|jpeg|png|gif|js|css)$">
ExpiresActive On
ExpiresDefault "access plus 1 weeks"
Header unset Last-Modified
Header set Cache-Control "public, no-transform"
SetOutputFilter DEFLATE
</FilesMatch>


6. Tune your MySQL database by ensuring that your database is set to cache query results and has enough space to do so wisely. Ensure options you don't use or require are disabled and make sure you regularly maintain your tables and indexes by keeping fragmentation to a minimum.

There are a couple of well known tuning scripts which can be used to aid in the setting of your MySQL configuration settings and which use your current database load and settings as a guide to offer recommendations.

http://github.com/rackerhacker/MySQLTuner-perl
http://hackmysql.com/mysqlreport











Uninstall Performance Hogging Plugins

There are lots of plugins available for WordPress and it can be like a case of a kid let lose in a candy shop as there seems to be at least 10 plugins for everything. However having too many plugins installed is definitely a bad thing in terms of performance and unless you know what the code is doing you could be shooting yourself in the foot by installing the next greatest plugin onto your site without thoroughly checking the source code out for yourself first.

The problem is that literally anyone can write and then publish a plugin on WordPress and many of these authors are not programmers by trade or have performance in the forefront of their minds as they develop the code that you might use.

Even plugins that are targeted as performance saving tools are not always beneficial and I have seen plugins that are designed to reduce bandwidth by returning 304 Not Modified headers or 403 Forbidden status codes but have to make numerous database queries, DNS lookups and carry out multiple regular expressions to do so. If Bandwidth is a problem then this might be worth the extra load but if it isn't then you are just swapping a small gain in one area for extra work somewhere else.

If you are going to use a plugin then take a look over the source code to see if you can help improve the performance by adding any missing indexes to any new tables the plugin might have added to your WordPress database. Many plugins do add tables especially if they need to store lots of data and many authors don't include the SQL statements to add appropriate indexes which could end up slowing down lookups down the road as the amount of data within the tables grows.

The following list are extra indexes I have added to tables within the WordPress database for both Plugins I installed and core WordPress tables that were missing indexes for certain queries. Remember WordPress is mainly a READ based system so the extra expense of adding indexes when data is inserted is usually worth it.


Plugin Table IndexName Columns IndexType
- wp_posts status_password_id post_status, post_password, ID Normal
- wp_posts post_date post_Date, ID Unique
fuzzySEOBooster wp_seoqueries_terms term_value_stid term_value, stid unique
fuzzySEOBoosterwp_seoqueries_data stid_pageid_pagetype_founded stid,page_id, page_type,founded unique
WP-O-Matic wp_wpo_campaign_post campaignid_feedid_hash `campaign_id, feed_id, hash Normal
Yet Another
Relatd Post
wp_yarpp_related_cache reference_id reference_ID, ID Normal

Ensure that you reguarly check the MySQL slow query log especially if you have just installed a new plugin as this will help you find queries that need optimising and potential bottlenecks caused by poorly thought out SQL.

On my own site I started off using a well known Related Posts plugin but I found out from the Slow log that the queries it ran to create the lists were killing performance due to their design.

They were taking 9-12 seconds to run and were scanning up to 25 million records at a time as well as carrying out unnecessary UNION statements which doubled the records it needed to look at. I ended up replacing it with a different plugin called LinkWithin which not only looked great due to the images it used but was perfect for performance because it was a JavaScript widget and all the work was carried out on their own server rather than mine.

This might not be the solution for you as obviously JavaScript is disabled by 10% of all visitors and bots won't be able to see the links.

If SEO is a concern, and it should be then you need to make sure that SERP crawlers find all your content easily and having a server side created list of related articles is a good idea for this reason alone. Therefore you can always create your own Related Posts section very easily with a function placed at the bottom of your articles that uses the categories assigned to the post to find other posts with the same category.

The following example shows one way in which this can be done and it makes use of a nice ORDER BY RAND() trick to ensure different articles and categories appear each time the SQL is run. It also uses Wordpresses inbuilt cache to store the results to prevent the query being executed too many times.

<?php
function get_my_related_posts($id, $limit){

// enable access to the WordPress DB object
global $wpdb;

// define SQL
$sql = "SELECT  CONCAT('http://www.mysite.com/',year(p.post_date),'/',RIGHT(concat('0' ,month(p.post_date)),2),'/',post_name,'/') as permalink,
p.post_title as title
FROM (
SELECT p.ID, p.post_name, p.post_title, p.post_date, terms.slug as category
FROM  wp_posts p,  wp_term_relationships tr,  wp_term_taxonomy tt,  wp_terms as terms
WHERE p.ID               != $id                 AND
p.post_type         = 'post'              AND
p.post_status       = 'publish'           AND
p.ID                = tr.object_id        AND
tr.term_taxonomy_id = tt.term_taxonomy_id AND
tt.taxonomy         in ( 'category')      AND
tt.term_id          = terms.term_id
GROUP BY  p.ID, p.post_title, p.post_name, p.post_date
ORDER BY terms.term_id
) as p,
(
SELECT distinct terms.slug
FROM wp_term_relationships tr, wp_term_taxonomy tt, wp_terms as terms
WHERE tr.object_id        = $id     AND
tr.term_taxonomy_id = tt.term_taxonomy_id AND
tt.taxonomy in ( 'category')    AND
tt.term_id          = terms.term_id
ORDER BY RAND() LIMIT 1
) as t
WHERE p.category = t.slug
ORDER BY  RAND()
LIMIT $limit";

// see if we have a cached recordset
$cache_name = "get_my_related_posts_" . $id;

$result = wp_cache_get( $cache_name );
if ( false == $result ) {

// get results and then cache for later use
$result = $wpdb->get_results( $sql );
wp_cache_set( $cache_name, $result );
}

// return result set as object
return $result;
}
?>
<div id="StrictlyRelatedPosts">
<h3>Related posts</h3>
<ul>
<?php
// fetch 5 related posts
$related_posts = get_related_posts($post->ID, 5);
// open loop
foreach ($related_posts as $related_post) {
$permalink = $related_post->permalink;
$title     = $related_post->title;
print "<li><a title=\"$title\" href=\"$permalink\">$title</a></li>\n";
} ?>
</ul>
</div>



Identifying Bottlenecks in Wordpress

One good plugin which I use for identifying potential problematic queries is the Debug Queries plugin which allows administrators to see all the queries that have run on each page. One extra tweak you should add is to put the following line in at the bottom of the get_fbDebugQueries function (around line 98)


$debugQueries .= ' ' . sprintf(__('» Memory Used %s'), $this->ConvertFromBytes($this->GetMemoryUsage(true))) . ' '. "\n";


Then add these two functions underneath that function (around line 106) which get the memory usage and format the value nicely.


// format size from bytes
function ConvertFromBytes($size){

 $unit=array('B','KB','MB','GB','TB','PB');

 return @round($size/pow(1024,($i=floor(log($size,1024)))),2).$unit[$i];
}

// get PHP memory usage
function GetMemoryUsage(){

 if(function_exists("memory_get_peak_usage")) {
  return memory_get_peak_usage(true);
 }elseif(function_exists("memory_get_usage")) {
  return  memory_get_usage(true);
 }
}


This will help you see just how many database queries a standard Wordpress page makes (88 on my homepage!) and if you haven't done any performance tuning then you may suddenly feel the urge before you suffer similar problems to those I experienced.

Remember a high performing site is one which attracts visitors and one which SERP bots are now paying more attention to when indexing. Therefore you should always aim to get the best performance out your system as is feasibly possible and as I have shown that doesn't mean spending a fortune on hardware.




Turning off WordPress features


If you ever look at your sites log file you might see that there is a lot of occurrences of requests to a page called wp-cron.php.

This is a page that handles internal scheduling by WordPress and many plugins hook into this to schedule tasks which is useful for people who don't have access to their webservers control panel as they can still set up "cron" jobs of a sort.

The only difference being that these cron jobs are fired when a page on the site is loaded and if you have a very quiet site a job you may want to run once every 5 minutes won't do if you don't get traffic every minute of the day.


POST /wp-cron.php?doing_wp_cron=1331142791

Sometimes you will even see multiple requests spawned (by your own servers IP) within the same second e.g

123.123.XXX.XX - - [07/Mar/2012:18:03:57 +0000] "POST /wp-cron.php?doing_wp_cron=1331143104 HTTP/1.0" 200 - "-" "WordPress/3.3.1; http://www.mysite.com"
123.123.XXX.XX - - [07/Mar/2012:18:03:57 +0000] "POST /wp-cron.php?doing_wp_cron=1331143109 HTTP/1.0" 200 - "-" "WordPress/3.3.1; http://www.mysite.com"
123.123.XXX.XX - - [07/Mar/2012:18:03:57 +0000] "POST /wp-cron.php?doing_wp_cron=1331143128 HTTP/1.0" 200 - "-" "WordPress/3.3.1; http://www.mysite.com"

To me this seems like overkill.

Yes the wp-cron job is needed to run internal Wordpress tasks such as posting scheduled posts or firing jobs that have been setup to use the internal cron system but having multiple requests fire at the same time seems unneccessary at best.

Why is this bad - well as this blog post about it says boltwebhosting.com says:

Wp-cron.php is called every time a page is loaded. That means if you are getting 50 visitors to your site every hour, and each of them reads 2-3 pages, then wp-cron.php is being called:
50 x 2.5 = 125 times per hour
125 x 24 = 3,000 times per day
3,000 x 30 = 90,000 times per month!
It does not just stop there, because unlike other features in WordPress, the wp-cron.php is spawned as an independent process which can sometimes take several minutes to complete its operations. So an active WordPress site with the traffic volume listed above is spawing 3,000 processes every day which do not really do anything.

Therefore on a very busy site you will be firing this page a lot of times and this may cause severe performance issues on it's own.

The solution is to replace this CRON job with a proper CRON job if possible.

To do this you either need access to your servers control panel or console but don't worry if you don't have access as you can still use a web based service like www.easycron.com.

As many hosting doesn't provide adequate Cron functions for their users this web based method is a great way of automating task without fiddling with your server.

If you do have the ability to setup a CRON task that fires the page once an hour or a time more appropriate to your needs then great. If you don't use the internal cron job for anything then the longer the gap the better but be careful as plugins may use it without your knowledge such as Database Backup plugins or Sitemap generator plugins. I set my CRON job to run the WP-CRON task every 10 minutes and this seems to be fine for my needs.

This is the format to use:

wget -U StrictlyCron -q -O /dev/null http://www.mysite.com/wp-cron.php?doing_wp_cron


You will notice that I am setting the -U parameter (user-agent) to StrictlyCron. This is because I block all blank useragent requests to my site with .htaccess rules (see the security article) and it also helps me identify my own requests in the log file.

Once you have done this you need to edit your sites wp-config.php file which will be in the root of your sites setup and add this line of code to the top of it.


/* disable WP-CRON from running all the time on every page load! */
define('DISABLE_WP_CRON', true);


As the comments state, this is disabling WordPress from firing it's own internal CRON job and as we have replaced it with a real CRON job that will run once an hour rather than on every page load it should reduce our traffic and server load considerably.



Turning Off WordPress HeartBeat

The WordPress HeartBeat functionality was introduced in WP 3.6 to allow interaction between the server and browser using AJAX. However like AutoSave and WP_CRON it can cause a lot of unnecessary HTTP requests as it defaults to 15 seconds a request.

The WordPress Heartbeat API allows WordPress to communicate between the web-browser and the server. It also improves session management, revision tracking, and auto saving. The WordPress Heartbeat API uses /wp-admin/admin-ajax.php, which allows WordPress to keep track of what's going on in the dashboard.

Unfortunately, this can also cause excessive requests to admin-ajax.php, leading to high CPU / Bandwidth usage. Whenever a web-browser is left open on a page using the Heartbeat API, this could potentially be an issue.

I have accidentally left open a post I was editing in a Chrome browser (that always re-opens pages that you had open when you close it) for a week and my bandwidth costs jumped by a good $30.

I scanned my log files and saw /wp-admin/admin-ajax.php being called every 15 seconds for the post page (seen in the Referer section of the log file).

Therefore I shut down the page ASAP and I added the following code to my functions.php file in my theme to only run the code on the post page as it's needed to delete custom fields, show tags and other features that make editing / adding posts easy.

To turn off the HeartBeat functionality go to your themes functions.php file and put the following code at the top of it.

If you don't want to turn it off but just change the timings from 15 seconds to a minute or something else you can but it relies on you editing a core compressed JavaScript WordPress file. You can read about how to do this here.

// stop heartbeat code
add_action( 'init', 'stop_heartbeat', 1 );

function stop_heartbeat() {
 global $pagenow;

        if ( $pagenow != 'post.php' && $pagenow != 'post-new.php' )
 {
  wp_deregister_script('heartbeat');
 }
}


Using WordPress Performance Enhanced Plugins

Now this isn't a sales pitch for my own plugins but you should try and avoid performance hogging plugins and use those with performance features built in.

For instance if your caching plugin has a "purge/delete cache" option then make sure it has a decent wait time inbetween each iteration of the loop otherwise it will consume all your CPU and Memory when you try deleting them. Ask the plugin author after reading their guide.

Also if you are using a Google Site Map on a busy site don't set it to build a whole new site map after every post. The server load may already be high and doing this will sent it higher. Also it is far better to just let the SERP crawlers crawl and find new content anyway.

However if you do want to use a sitemap then use one that lets you set the time the plugin is built at staged intervals through CRON or Web Cron jobs.

My old defunt Strictly Google Sitemap plugin which I no longer support but do use on all my sites because of it's unique features which include:

  • A low number of SQL queries run compared to other sitemap plugins, 
  • Fewer WordPress functions run that other plugins, 
  • The ability for the memory applied to the build process to automatically increment as required. So that you don't get out of memory errors as I used to with other well known plugins.


Even though some features are defunct it is still a great plugin to use for big sites needing sitemaps generated quickly.

With my plugin you can create sitemaps at set times and you can do all the stuff normal sitemap plugins do. The only bits that have stopped working are the SEO parts due to how Twitter, Google, BING and the others all work.

Also with my Strictly TweetBOT PRO plugin that allows you to post Tweets to as many accounts as you want (or the same account multiple times with different content), you might be interested in the delay functionality of the plugin.

It has a delay option where you you can set in seconds how long to wait after sending an HTTP GET request to your new post to get it into the cache before tweeting.

It also has an option to set a delay in seconds before each Tweet is sent out to the account. This allows for enough time for any previous Twitter Rushes to die down before creating a new one.

It also staggers the Tweets out so they don't all look like they are coming from the same place.

Buy Strictly TweetBOT PRO now


WordPress Performance Summary


  • Ensure Apache is configured correctly and don't leave the default values as they are. Make sure MaxClients is set correctly by dividing your RAM by the average Apache process size leaving room for MySQL and anything else you might be running.

  • Tune your MySQL database by configuring correctly and maintaining regularly. Use one of the many free tuning scripts to help set your configuration up correctly but ensure you read up about the various settings and what they do first.


  • Install a Caching plugin that creates hard copies of commonly requested files. Static HTML is fast to load. PHP is costly to compile. Use a PHP accelerator and ensure database query results are cached.


  • Reduce bandwidth by combining, compressing and minifying your CSS, JS and HTML. If your caching plugin doesn't do it once rather than on the fly do it by hand. Remember the key is to do expensive operations once and then re-use the results as many times as possible.


  • Set your .htaccess file up correctly. Ban bad bots to reduce traffic, set far future expiry headers on your static files and use static files to handle 404, 403, 503 errors etc.


  • Reduce the number of plugins and ensure any that you use are not hurting performance. Make sure any tables they use are covered by indexes and use the slow query log to identify problems.



  • Disable WordPress's internal CRON job and replace it with a real CRON job that runs once every hour or 30 minutes rather than on every page load.



  • Disable WordPress HeartBeat functionality or only allow it on post edits to prevent repeated HTTP calls if a page is left open in a browser. You can change the timings from 15 seconds to whatever you want but this means editing a compressed WordPress core JS file. 




Read Part 1 - An Overview
Read Part 3 - Security



Further Reading:






Thursday, 28 May 2015

Twitter Rush - The Rush just gets bigger and bigger!

Twitter Rush - The Rush Just Gets Bigger And Bigger!

By Strictly-Software

The amount of BOTs, social media sites and scrapers that hit your site after you post a Tweet with a link in it to a site to Twitter just gets bigger and bigger. When I first started recording the BOTS that hit my site after a post to Twitter it was about 15 now it has grown to over 100+!

You can read about my previous analysis of Twitter Rushes here and here however today I am posting the findings of a recent blog posting using my Strictly TweetBOT WordPress plugin to Twitter and the 108 HTTP Requests that followed in the following minutes after posting.

If you are not careful these Twitter Rushes could consume your web servers CPU and Memory as well as making a daisy chain of processes waiting to be completed that could cause high server loads and long connection / wait times for the pages to load.

You will notice that the first item in the list is a POST to the article.

That is because in the PRO version of my Stricty TweetBOT I have an option to send an HTTP request to the page before Tweeting. Then you can wait a few seconds (a setting you control), before any Tweets are sent out to ensure the plugin has enough time to cache the page.

This is so that if you have a Caching Plugin installed (e.g on WordPress WP Super Cache) or another system, the page is hopefully cached into memory or hand written as an HTML file to prevent any overload when the Twitter Rush comes.

It is always quicker to deliver a static HTML file to users than a dynamic PHP/.NET file that needs DB access etc.

So here are the results of today's test.

Notice how I return 403 status codes to many of the requests. 

This is because I block any bandwidth wasters that bring no benefit at all to my site.

The latest batch of these bandwidth wasters seem to be social media and brand awareness BOTS that want to see if their brand or site is mentioned in the article.

They are of no benefit to you at all and you should either block them using your firewall or with a 403 status code in your .htacces file.

Please also note the amount of duplicate requests from either the same IP address or the same company e.g TwitterBOT or Facebook that are made to the page. Why they do this I do not know!

The Recent Twitter Rush Test - 28-MAY-2015

XXX.XXX.XXX.XXX - - [28/May/2015:17:08:17 +0100] "POST /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/?r=12 HTTP/1.1" 200 22265 "-" "Mozilla/5.0 (http://www.strictly-software.com) Strictly TweetBot/1.1.2" 1/1582929
184.173.106.130 - - [28/May/2015:17:08:22 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "ShowyouBot (http://showyou.com/crawler)" 0/3372
199.16.156.124 - - [28/May/2015:17:08:21 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22377 "-" "Twitterbot/1.0" 1/1301263
199.16.156.125 - - [28/May/2015:17:08:21 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22375 "-" "Twitterbot/1.0" 1/1441183
185.20.4.220 - - [28/May/2015:17:08:21 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22377 "-" "Mozilla/5.0 (TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0" 1/1224266
17.142.151.49 - - [28/May/2015:17:08:21 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22375 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)" 1/1250324
151.252.28.203 - - [28/May/2015:17:08:22 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22374 "http://bit.ly/1eA4GYZ" "Go 1.1 package http" 1/1118106
46.236.26.102 - - [28/May/2015:17:08:23 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22376 "-" "Mozilla/5.0 (TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0" 0/833367
199.16.156.124 - - [28/May/2015:17:08:23 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22376 "-" "Twitterbot/1.0" 0/935200
142.4.216.19 - - [28/May/2015:17:08:24 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 - "-" "Mozilla/5.0 (compatible; OpenHoseBot/2.1; +http://www.openhose.org/bot.html)" 0/1964
17.142.152.131 - - [28/May/2015:17:08:24 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22375 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)" 0/875740
52.5.154.238 - - [28/May/2015:17:08:25 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22376 "-" "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31" 1/1029660
4.71.170.35 - - [28/May/2015:17:08:26 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate HTTP/1.1" 403 251 "-" "grokkit-crawler (pdsupport@purediscovery.com)" 0/1883
192.99.19.38 - - [28/May/2015:17:08:26 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 - "-" "Mozilla/5.0 (compatible; OpenHoseBot/2.1; +http://www.openhose.org/bot.html)" 0/1927
141.223.91.115 - - [28/May/2015:17:08:28 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "_bit=55673d4a-0024b-030a1-261cf10a;domain=.bit.ly;expires=Tue Nov 24 16:07:38 2015;path=/; HttpOnly" "-" 1/1592735
17.142.151.101 - - [28/May/2015:17:08:32 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22260 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)" 17/17210294
184.173.106.130 - - [28/May/2015:17:08:49 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "ShowyouBot (http://showyou.com/crawler)" 0/1870
142.4.216.19 - - [28/May/2015:17:08:49 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Mozilla/5.0 (compatible; OpenHoseBot/2.1; +http://www.openhose.org/bot.html)" 0/1601
52.6.187.68 - - [28/May/2015:17:08:28 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22262 "-" "Typhoeus - https://github.com/typhoeus/typhoeus" 20/20260090
45.33.89.102 - - [28/May/2015:17:08:28 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22262 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" 20/20370939
134.225.2.7 - - [28/May/2015:17:08:26 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22260 "-" "Mozilla/5.0 (TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0" 22/22337338
2a03:2880:1010:3ff4:face:b00c:0:8000 - - [28/May/2015:17:08:25 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22261 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 23/23973749
134.225.2.7 - - [28/May/2015:17:08:27 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Mozilla/5.0 (TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0" 21/21602431
54.167.214.223 - - [28/May/2015:17:08:25 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36" 24/24164062
4.71.170.35 - - [28/May/2015:17:08:51 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "grokkit-crawler (pdsupport@purediscovery.com)" 0/1688
192.99.19.38 - - [28/May/2015:17:08:51 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Mozilla/5.0 (compatible; OpenHoseBot/2.1; +http://www.openhose.org/bot.html)" 0/1594
54.246.137.243 - - [28/May/2015:17:08:51 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 - "-" "python-requests/1.2.3 CPython/2.7.6 Linux/3.13.0-44-generic" 0/1736
92.246.16.201 - - [28/May/2015:17:08:51 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0" 0/725424
4.71.170.35 - - [28/May/2015:17:08:55 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "grokkit-crawler (pdsupport@purediscovery.com)" 0/1808
2a03:2880:2130:9ff3:face:b00c:0:1 - - [28/May/2015:17:08:57 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate HTTP/1.1" 301 144 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 0/657830
54.198.122.232 - - [28/May/2015:17:08:51 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/534.24 (KHTML, like Gecko) (Contact: backend@getprismatic.com)" 7/7227418
2a03:2880:1010:3ff7:face:b00c:0:8000 - - [28/May/2015:17:08:51 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22255 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 7/7169003
54.198.122.232 - - [28/May/2015:17:08:51 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22257 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/534.24 (KHTML, like Gecko) (Contact: backend@getprismatic.com)" 7/7185701
2607:5300:60:3b37:: - - [28/May/2015:17:08:53 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0" 5/5298648
2a03:2880:2130:9ff7:face:b00c:0:1 - - [28/May/2015:17:08:56 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22267 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 1/1999466
178.32.216.193 - - [28/May/2015:17:08:49 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "http://bit.ly/1eA4GYZ" "LivelapBot/0.2 (http://site.livelap.com/crawler)" 9/9518327
199.59.148.209 - - [28/May/2015:17:08:58 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22260 "-" "Twitterbot/1.0" 1/1680322
54.178.210.226 - - [28/May/2015:17:08:58 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22257 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 1/1842148
54.198.122.232 - - [28/May/2015:17:08:58 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/534.24 (KHTML, like Gecko) (Contact: backend@getprismatic.com)" 1/1903731
54.198.122.232 - - [28/May/2015:17:09:00 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/534.24 (KHTML, like Gecko) (Contact: backend@getprismatic.com)" 1/1131792
2607:5300:60:3b37:: - - [28/May/2015:17:09:00 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22260 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0" 1/1048667
199.59.148.209 - - [28/May/2015:17:09:02 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "-" "Twitterbot/1.0" 1/1024583
54.178.210.226 - - [28/May/2015:17:09:02 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22260 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 1/1251088
65.52.240.20 - - [28/May/2015:17:09:03 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)" 0/814087
54.92.69.38 - - [28/May/2015:17:09:04 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/925457
54.92.69.38 - - [28/May/2015:17:09:05 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22266 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/932984
54.178.210.226 - - [28/May/2015:17:09:06 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/927202
54.178.210.226 - - [28/May/2015:17:09:08 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22260 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/717344
54.167.123.237 - - [28/May/2015:17:09:09 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)" 0/2286
54.178.210.226 - - [28/May/2015:17:09:12 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22254 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/971022
37.187.165.195 - - [28/May/2015:17:09:52 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "-" "Mozilla/5.0 (compatible; PaperLiBot/2.1; http://support.paper.li/entries/20023257-what-is-paper-li)" 0/688208
74.112.131.244 - - [28/May/2015:17:10:24 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22257 "-" "Mozilla/5.0 ()" 3/3572262
52.68.118.157 - - [28/May/2015:17:11:35 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/688056
52.68.118.157 - - [28/May/2015:17:11:35 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/719851
52.68.118.157 - - [28/May/2015:17:11:37 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22256 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/739706
52.68.118.157 - - [28/May/2015:17:11:38 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/760912
74.6.254.121 - - [28/May/2015:17:12:05 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 0/248578
66.249.67.148 - - [28/May/2015:17:12:38 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0/493468
54.145.93.204 - - [28/May/2015:17:13:25 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "jack" 0/1495
54.145.93.204 - - [28/May/2015:17:13:26 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22257 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2008091620 Firefox/3.0.2" 0/597310
178.33.236.214 - - [28/May/2015:17:13:41 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Mozilla/5.0 (compatible; Kraken/0.1; http://linkfluence.net/; bot@linkfluence.net)" 0/2065
173.203.107.206 - - [28/May/2015:17:13:50 +0100] "POST /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/?r=12 HTTP/1.1" 200 22194 "-" "Mozilla/5.0 (http://www.strictly-software.com) Strictly TweetBot/1.1.2" 4/4801717
184.173.106.130 - - [28/May/2015:17:13:58 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "ShowyouBot (http://showyou.com/crawler)" 0/96829
178.32.216.193 - - [28/May/2015:17:13:57 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22303 "http://bit.ly/1eA4GYZ" "LivelapBot/0.2 (http://site.livelap.com/crawler)" 1/1032211
192.99.1.145 - - [28/May/2015:17:13:59 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22303 "http://bit.ly/1eA4GYZ" "LivelapBot/0.2 (http://site.livelap.com/crawler)" 1/1535270
184.173.106.130 - - [28/May/2015:17:14:02 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "ShowyouBot (http://showyou.com/crawler)" 0/1764
52.6.187.68 - - [28/May/2015:17:14:01 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22293 "-" "Typhoeus - https://github.com/typhoeus/typhoeus" 66/66512611
146.148.22.255 - - [28/May/2015:17:15:10 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22292 "-" "Mozilla/5.0 (compatible; Climatebot/1.0; +http://climate.k39.us/bot.html)" 0/885387
74.6.254.121 - - [28/May/2015:17:15:11 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 1/1789256
146.148.22.255 - - [28/May/2015:17:15:17 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22290 "-" "Mozilla/5.0 (compatible; Climatebot/1.0; +http://climate.k39.us/bot.html)" 1/1275245
54.162.7.197 - - [28/May/2015:17:15:18 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22291 "http://bit.ly/1eA4GYZ" "Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.107 Safari/534.13 v1432829642.1352" 0/711142
146.148.22.255 - - [28/May/2015:17:15:24 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22293 "-" "Mozilla/5.0 (compatible; Climatebot/1.0; +http://climate.k39.us/bot.html)" 0/742404
54.162.7.197 - - [28/May/2015:17:15:32 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22289 "-" "msnbot/2.0b v1432829684.8617" 0/717679
23.96.208.137 - - [28/May/2015:17:16:05 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22294 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)" 0/560954
69.164.211.40 - - [28/May/2015:17:17:38 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22293 "-" "Mozilla/5.0 (compatible; EveryoneSocialBot/1.0; support@everyonesocial.com http://everyonesocial.com/)" 0/516967
96.126.110.221 - - [28/May/2015:17:18:24 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22300 "-" "Mozilla/5.0 (compatible; EveryoneSocialBot/1.0; support@everyonesocial.com http://everyonesocial.com/)" 0/464585
69.164.217.210 - - [28/May/2015:17:18:42 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22288 "-" "Mozilla/5.0 (compatible; EveryoneSocialBot/1.0; support@everyonesocial.com http://everyonesocial.com/)" 0/482230
173.255.232.252 - - [28/May/2015:17:19:03 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22293 "-" "Mozilla/5.0 (compatible; EveryoneSocialBot/1.0; support@everyonesocial.com http://everyonesocial.com/)" 0/514587
173.255.232.252 - - [28/May/2015:17:19:12 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22288 "-" "Mozilla/5.0 (compatible; EveryoneSocialBot/1.0; support@everyonesocial.com http://everyonesocial.com/)" 0/858459
96.126.110.222 - - [28/May/2015:17:19:26 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22288 "-" "Mozilla/5.0 (compatible; EveryoneSocialBot/1.0; support@everyonesocial.com http://everyonesocial.com/)" 0/469048
92.222.100.96 - - [28/May/2015:17:19:28 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22291 "-" "sfFeedReader/0.9" 0/574409
54.176.17.88 - - [28/May/2015:17:20:20 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+DarkPolitricks+%28Dark+Politricks%29 HTTP/1.1" 200 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36" 1/1112283
184.106.123.180 - - [28/May/2015:17:20:56 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22285 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" 0/522039
74.6.254.121 - - [28/May/2015:17:22:35 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 0/260972
54.163.57.132 - - [28/May/2015:17:23:05 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Ruby" 0/2749
54.163.57.132 - - [28/May/2015:17:23:07 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Ruby" 0/1647
54.163.57.132 - - [28/May/2015:17:23:09 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Ruby" 0/1487
178.33.236.214 - - [28/May/2015:17:23:14 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+DarkPolitricks+%28Dark+Politricks%29 HTTP/1.1" 403 252 "-" "Mozilla/5.0 (compatible; Kraken/0.1; http://linkfluence.net/; bot@linkfluence.net)" 0/1996
168.63.10.14 - - [28/May/2015:17:23:23 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 - "-" "Apache-HttpClient/4.1.2 (java 1.5)" 0/1602
168.63.10.14 - - [28/May/2015:17:23:23 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Apache-HttpClient/4.1.2 (java 1.5)" 0/1486
74.6.254.121 - - [28/May/2015:17:24:05 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 0/260635
45.33.35.236 - - [28/May/2015:17:24:59 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22284 "-" "Mozilla/5.0 ( compatible ; Veooz/1.0 ; +http://www.veooz.com/veoozbot.html )" 0/618370
74.6.254.121 - - [28/May/2015:17:25:35 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 0/255700
70.39.246.37 - - [28/May/2015:17:26:10 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22283 "-" "Mozilla/5.0 Moreover/5.1 (+http://www.moreover.com; webmaster@moreover.com)" 0/469127
82.25.13.46 - - [28/May/2015:17:28:55 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22285 "https://www.facebook.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36" 0/568199
157.55.39.84 - - [28/May/2015:17:29:17 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22284 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 0/478186
188.138.124.201 - - [28/May/2015:17:30:10 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22284 "-" "ADmantX Platform Semantic Analyzer - ADmantX Inc. - www.admantx.com - support@admantx.com" 2/2500606
54.204.149.66 - - [28/May/2015:17:30:30 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22285 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0 (NetShelter ContentScan, contact abuse@inpwrd.com for information)" 0/680643
54.198.122.232 - - [28/May/2015:17:30:31 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+DarkPolitricks+%28Dark+Politricks%29 HTTP/1.1" 200 22220 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/534.24 (KHTML, like Gecko) (Contact: backend@getprismatic.com)" 0/650482
64.49.241.208 - - [28/May/2015:17:30:32 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22284 "-" "ScooperBot www.customscoop.com" 0/658243
50.16.81.18 - - [28/May/2015:17:30:46 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22283 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0 (NetShelter ContentScan, contact abuse@inpwrd.com for information)" 0/673211
54.166.112.98 - - [28/May/2015:17:30:47 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A%20DarkPolitricks%20%28Dark%20Politricks%29 HTTP/1.1" 403 252 "-" "Recorded Future" 0/1645
54.80.130.191 - - [28/May/2015:17:30:59 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Recorded Future" 0/2777
74.6.254.121 - - [28/May/2015:17:31:35 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22282 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 0/606530
74.6.254.121 - - [28/May/2015:17:33:05 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22283 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 0/490218
54.208.89.59 - - [28/May/2015:17:34:31 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "-" "-" 0/309191
54.208.89.59 - - [28/May/2015:17:34:32 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22281 "-" "-" 0/647914
74.6.254.121 - - [28/May/2015:17:34:35 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22284 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 0/555798

So the moral of the story is this:

  • Be careful when you post to Twitter as you will get a rush of traffic to your site in the following minutes which could cause your site problems.
  • Try and block social media / brand awareness / spam BOTS if you can so they don't consume your bandwidth and CPU/Memory/
  • Use either your servers firewall, or .htaccess file to block BOTS you consider a waste of your money. Remember any HTTP request to your site when you are using a VPS will cost you money. Why waste it on BOTS that provide no benefit to you.
  • Try and mitigate the rush by using the Crawl-Delay Robots.txt command to stop the big SERP BOTS from hammering you straight away.

I am sure I will post another Twitter Rush analysis in the coming months and the number of BOTS will have grown from the initial 15+ or so when I first tested it to 200+!