Showing posts with label Performance. Show all posts
Showing posts with label Performance. Show all posts

Sunday, 22 January 2023

TSQL Batch Updates SQL 2005 - 2008

Updating tables in Batches to prevent locking in SQL

There are times when you may need to carry out UPDATES on large tables that are in use and constantly being inserted, deleted, or updated.

If you carry out a large UPDATE that affects all of the rows in the table, and in my case millions of rows, then the table will be locked for the duration of the update and any other processes that may need to carry out DML statements will be BLOCKED from doing so.

For example, you may experience long delays caused by locking when trying to return data from this table on a website or API service. Or even worse deadlocks but you will most definitely experience performance issues and if any SELECT statements that access the data don't use a WITH (NOLOCK) statement they too will have to wait in line for the UPDATE to finish.

Obviously wrapping WITH (NOLOCK) onto every SELECT statement is not a good solution unless you know what you are doing as it will provide dirty reads and you may end up giving your users old or incorrect data

This might be fine for some scenarios but in critical applications where data integrity is key then you need another solution that provides data integrity and allows you to UPDATE the table without performance problems.

When I find myself requiring the need to UPDATE every record in a large table I use a BATCH UPDATE process which cuts the large UPDATE statement down into lots of small UPDATES that affect only a few rows at a time.

By doing this the UPDATE rattles through all the tows of even large tables very quickly as long as the batch size of records updated in each loop iteration is small enough not to cause locking that may affect front-end processes.

For example instead of the whole table getting locked for an hour with lots of blocked processes building up behind waiting for it to finish it would instead only be locked for lots of little time periods lasting seconds or less.

These smaller locking periods allow other processes in to do their work and if the batch size is small enough and you have appropriate indexes you might find that you won't experience a full table lock anyway.

There are various methods for carrying out this approach and you should tailor your BATCH SIZE to your own requirements. Before SQL 2005 you could use the: SET NOCOUNT 50 command to set the size of the batch but in SQL 2005 and beyond you can use a variable directly with an UPDATE TOP (@VAR) command.

This is an example of a BATCH UPDATE, that uses a column in the table called LastUpdated, which gets updated every time the row in the table is. You could do this either through stored procedures that update the table, or triggers on insert and update. However because on each loop I update this column it means on the next loop the same records won't get touched as the time is within the 20 minutes I have set as the BATCH to be updated, 

Obviously, this must be tailored to your own system, whether you create an "updated" flag column that is defaulted to 0, and then on the BATCH UPDATE set to 1, and the WHERE statement that selects which TOP X records are looked at ignore any that have been set to 1.

You definitely need something to change on UPDATE, otherwise, you will find this process going on forever as there is no way to order the records for that the UPDATE statement so that it could keep getting the same TOP(X) records on each batch, enabling the process to rattle on forever with you scratching your head wondering why. 

If you can wangle an ORDER BY statement with a convoluted statement then that might work, however having a simple date or flag that is updated within the batch, and is also checked on each loop so that the same records are not looked at over and over again is the easy answer to this issue. 

SET NOCOUNT ON
SET DATEFORMAT YMD

DECLARE @ROWS INT, @TOTALROWS INT, @BATCHSIZE INT

SELECT @ROWS = 1,
@TOTALROWS = 0,
@BATCHSIZE = 50

-- As we start @ROWS at 1 and we know there are thousands of records to update
-- then when it gets to the stage where the UPDATE returns @@rowcount of 0
-- we have finished the criteria of the loop and so exit it the sanity check ensures this 
DO WHILE @ROWS > 0 
BEGIN
     
     -- Show the time this batch started as it might take hours
     PRINT 'Job started at ' + UPPER(FORMAT(GETDATE(),'hh:mm:ss dd/MMM/yyyy'))

     -- We must have a way that we don't keep updating the same records over and over again,
     -- so I use the LastUpdated date which gets updated on each batch update then checked
     -- in the where clause to ensure the date is at least 20 minutes in the future

     -- Update data in the table in batches to prevent blocking if we have to do this
     -- whilst people are using the site and accessing the table at the same time
     UPDATE  TOP(@BATCHSIZE) MyTable
     SET     MyColumn = dbo.udf_SOME_FUNCTION(MyPK),
             Flag = 1,
             LastUpdated = GETDATE()
     WHERE   SomeDate > '2023-JAN-01'
             AND Flag = 0 -- put in for the update
             -- this could be cut out
             AND DATEDIFF(MINUTE,LastUpdated,GETDATE())>20

     SELECT @ROWS = @@ROWCOUNT, @TOTALROWS = @TOTALROWS + @ROWS

    
     PRINT 'Updated ' + CAST(@ROWS as varchar) + ' in batch'

     -- As this UPDATE job may take hours we want other processes
     -- to be able to access the tables in case of locks therefore
     -- we wait for 2 seconds between each BATCH to allow
     -- time for these processes to aquire locks and do their job
     WAITFOR DELAY '00:00:02'

     -- samity check
     IF @ROWS = 0
       BREAK
 
 END

PRINT 'Updated ' + CAST(@TOTALROWS as varchar) + ' total rows'

I am currently using this process now to update a table that is constantly being accessed by a very busy API system that has over a million rows in it and by using BATCH UPDATES it isn't causing any BLOCKING, LOCKING or performance issues at all.

If you really wanted to give the SELECT statements looking at this table as you UPDATE it in batches then you could add in a DELAY within each loop e.g a 2-second DELAY after the UPDATE statement and the SELECT @ROWS = ..... that collects stats for you to look at after the process has finished would just be something like this.
WAITFOR DELAY '00:00:02'
So hopefully this might help you out if you are having to UPDATE large tables that are also in use at the same time by websites, API's, services, or 3rd parties.

© 2023 - By Strictly-Software

Saturday, 18 June 2016

Why just grabbing code from the web can lead to major problems down the line

Why just grabbing code from the web can lead to major problems down the line

By Strictly-Software.com

I have wrote many articles over the years about server, system, website and PC performance, and it seems that the more versions of FireFox and Chrome that come out, the slower they get. I don't think I have ever used IE 11 as much as I have in the last 3 months. Mostly just to get Facebook, Radio 1 or Google+ to load within a minute which FF and Chrome seem to have issues with for some reason.

Some add-ons like uBlock Origin prevent 3rd party domain code from being loaded up on the site as well as large image or video/flash objects. It also stops pop-up windows and the loading of remote CSS fonts which is all the craze now.

What the developers of these websites don't seem to realise is that when they are loading in code from all over the web just to make a page display or run it causes a lot of network traffic. It also introduces the possibility that the code at the end source has been tampered with and therefore you could be loading in Cross Site Scripting hacks or ways for people to exploit your site if that certain script exists in the DOM.

Also a less likely scenario but a more common issue is that the more domains your site has to access to get all it's code onto the site, it can mean the page doesn't load as you may want it to, or even not at all.

If script A relies on Script B but Script B doesn't load for a long time then the code in Script A that was going to open a popup window on DOM Load, or play a video just isn't going to work.

I recently overrode the Window.OnError event and logged the Message, URL and Line No with an AJAX call to a log file before either throwing the error for modern sites or hiding it for older ones.

When I started looking through these files the amount of Google AdSense and Tracker scripts not loading due to timeouts is incredible. Also there are issues with bugs in the scripts or due to their slow loading objects not being available for other scripts relying on them to use. An example of just one error is:

24/04/2016 09:54:33 : 8X.XXX.XXX.161 'document.body' is null or not an object in http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js on line 19

People relying on Google for stats shouldn't for a number of reasons. Not only do they not always load and record the visit, but they also rely on 3rd party cookies being enabled and JavaScript being enabled. A Log parser or DB is a much better way to log every single visitor BOT or Human.

For example if you have a main jQuery script you are loading in from a CDN or from a site you don't control, if that domain is having network problems then that means any other code on the site reliant on it won't be able to work until that issue is resolved. This happens a lot from viewing the messages in my JavaScript error log file.

Due to this a lot of  people just grab the code off the net and load it in from a local server to get round network delays.

However by doing this they are stuck in a point of time (the date and the version they copied the file at). I hate this, as instead of actually learning JavaScript so they know what they are doing they are relying on some other blokes framework to solve their problems e.g have a look at whose code most of you are building your site with. If there is a bug in jQuery you either have to fix it yourself or wait for John to fix it. If it's your own code at least you can rely on your own skills and know how the code works.

The other day I had to solve a jQuery problem where the page in question was using an old version of jQuery and another 3rd party script built around jQuery (but not by John), called reveal.js.

As the front end developers wanted to move to the latest version of jQuery they suddenly found that the reveal.js code no longer worked.

After debugging it was clear that the $().live(function) had been removed and as the code that did the popup relied on reveal.js and it was built in 2011 with no recent updates. The whole revealing and hiding of modal boxes stopped as soon as a modern version of jQuery was loaded in for the site.

I had to waste time reading up on jQuery and then hardcoding the version of reveal.js as we had to use the new .on() function so that the new jQuery libraries would work with the old code that was taken from a library developed in 2011.

This is one thing I hate about front end developers who just pick n choose libraries off the web despite them all doing the same thing like event binding and removal multiple times in multiple ways.

If they are relying on a 3rd party library they took from 2011 that also relies on a constantly updated framework like jQuery that is always dropping and adding new methods, then how are people to expect sites to work when a method these libraries rely on are removed?

If they cannot write some basic notes to say that this page relies on this script e.g reveal.js, which came with jQuery 1.4.5 then it makes people like me who hate debugging other peoples frameworks hate 3rd party code even more.

Not only do I have my own getme.js framework which is simple, uses CSS selectors, linked methods where the array of objects is passed down from function to function, but now that most browsers support the simple one line of code that allows for selectors to find objects there is no need to add Sizzle.js to it any-more. Not unless you really want to support old IE versions you can just use this single line.

// where query is the CSS selector
document.querySelectorAll( query ); 

For example in my Getme.js code this following line of code will loop through all Anchor nodes with a class of menu on them inside the DIV with the ID MAIN. I just then alert out the elements ID.

G('DIV#Main > A.menu').each(function(){
   alert(this.id);
})

Obviously if you do all your styling in CSS or inline JS you have the option of how to style a series of objects for example with the .setAtts method you can pass in any element attribute and their values.

This is providing a mixture of a class and inline styles to the Paragraphs inside DIV tags. It also uses chaining where the array of object are passed from one function to the next just like other frameworks.

The first example just looks for DIV tags with P's inside and sets the class to "warningRed" and the style of the font to bold and red. The class can do most of the styling or ALL of it.

It's just an example, so is the 2nd one that shows all P tags with a SPAN with the class "info". Inside it gets a warning message with the .setHTML method and then the .setStyle method colours the text.


G('DIV > P').setAtts({class:"warningRed", style:"color:red; font-weight:bold"});

G('P > SPAN.info').setHTML('CLick for help.').setStyle({color:red, fontSize:8px});


I used a G instead of $ just to distinguish it from all the other frameworks and because it's called Getme.js.

If you want to know how to learn to write your own chainable framework then have a read of this article of mine. I've kept Getme.js simple as I hate people who just copy code from the web especially when it goes wrong.

At least this way I have a wrapper object that allows for chaining and the setting of multiple attributes at once and the use of selectors. However I still like to use pure JavaScript inside my functions so people down the line can get their heads around it.

So next time I get a jQuery problem because John Resig has decided to remove a core function from his framework which then causes a chain re-action due to all the other frameworks that were built around that version of jQuery, I can at least (hopefully) use my simple framework to apply the CSS that the designers need to rather than spend a day hunting around for fixes to other people's code.

That, is something I really hate doing.



By Strictly-Software.com 

© 2016 Strictly-Software.com

Don't Be Fooled By "Turbo Boost" and Windows Performance / Cleaner Applications

Don't Be Fooled By "Turbo Boost" and Windows Performance / Cleaner Applications


By Strictly-Software.com

I bet if you have been online for a more than a few times you will have undoubtedly seen adverts for tools and applications that will "Speed up your computer" or "Tune it up", "remove unnecessary files" and even malware.

Most of these apps are con tricks in that they will run, show you a really high number of problems either to do with security, privacy or performance and when you go to fix them you are told you must pay a fee of £29.99 to get the full version.

Scam code I call it.

Mainly because people don't know what half the items that are recorded as security holes or performance issues are. For example to get a nice big list of privacy concerns about 20,000 they might list every single cookie you have from every browser.

If you don't know what a cookie is it it's a harmless small text file that holds very small information about your visit to the site e.g by linking your username to a member ID so that the next time you visit the site you don't have to keep re-typing your username in the login box.

For example if you install the Web Developer Toolbar on FireFox you can view all the cookies on a site, domain including sessions. Viewing the cookies for this site I see one that gives me this really important information....

Name: SNID
Value: 72=i-mBmgOp22ixVNh68LucZ_88i1MnYk0FkV2k8k3s=uNr4G5YjLe6X9iAQ
Host: .google.com
Path: /verify
Expires: Mon, 11 Apr 2016 16:43:43
GMT Secure: No
HttpOnly: Yes

I have no idea what the cookie value for SNID means and most people apart from the web developers won't so when people try and scare you with "cookies are dangerous" - something I have heard from my parents many times - just ignore their ignorance of web development.

They just need to realise that unless your password is stored in a plain text cookie (which never happens) then you don't have much to fear from cookies at all. They just fill up your local data directories the more sites you visit.

The one thing you may not like are tracking cookies e.g Google who try and track you from site to site to see what kind of information you are interested in so that they can show you relevant adverts.

Turning off 3rd party cookies in Chrome or the browser of your choice and setting DNT (Do Not Track) to YES/ON is worth doing even if some browsers don't support the DNT header.

Turbo Mode

Turbo mode is one of those cool sounding options that seem to signal that just by pressing the Turbo ON button your whole machine will speed up. In reality it does a few things, many of which might not even be happening at the time you press it.

These include:

-Stopping a scheduled de-fragmentation of your hard disk. Something that is rarely needed or used anyway but does consume memory and CPU if running.
-Stopping any scheduled tasks from running. These could be updates, downloads of applications that require updates and the automatic creation of system backup and restore points.
-Postpone the automatic download and installation of important application and Windows updates.

You will be informed about the postponing of downloads and automatic updates such as Windows Updates if enabled.

In reality it doesn't do much but sounds and looks good when it says it has boosted your systems performance by 25% etc. Just beware that there is no way of it really knowing how much it has helped and it is probably negligible anyway.

If you really want to speed up your PC, open the task manager, enable the show all processes option and then order the results by CPU or Memory. The programs at the top using over 1GB should certainly be looked at and may have memory leaks.

A shut down of those applications and then re-opening of them might help you out a lot. I find some apps like MS SQL 2015 really drain my memory if I leave them on for days and a reboot now and then is the best remedy for most problems.

It may be a joke from the IT Crowd to "Turn it on and off again", but in reality that does solve a hell of a lot of problems with computers running high memory or CPU.

Always try and install Windows updates regularly so you are not waiting around hours for those 64 updates to install like I have a number of times due to keep hitting the "Remind me in 15 minutes" button. A reboot with the most up to date software is the best thing you can do for your PC as well as removing applications and plugins for browsers that you never use.

The more unnecessary applications you have on your system the more apps you will find in your Windows Start Up options running just to monitor for updates. Google does it, iTunes does it, and many other programs do as well. The more you can trim your system down so it only uses what you want it to use the better.

Plugins on browsers that were only used once should be removed afterwards.Regularly check if you are actually using all the browser plugins as when they are updated the old versions are hardly ever removed.

Applications you downloaded to do one task should also be uninstalled before you forget about them.

The leaner the machine the quicker the machine. I have a 16GB RAM 64GB Windows box at work and I regularly hit 12/13GB of memory. I usually know this is happening because the radio cuts out. However as I hate closing everything down, waiting for the installations and then trying to remember what I had open at the time I tend to let the memory rise and rise and then get frustrated as everything slows down.

If someone could invent a program that would remember what was open and then after rebooting re-open every app, file (with text), and program that was running before would make a mint. If something like this already exist PLEASE TELL ME WHERE I CAN FIND IT!

Clean your PC manually

This part of the article shows you how these myriad of application cleaner tools which trick you into paying money to speed up your PC are basically useless. Even tests have proved that running the following Windows 8+ built system applications can be just as affective.

Use the built in Disk Cleanup tool included with Windows. It’s focused on freeing up space on your hard drive, but it will also delete old temporary files and other useless things. Just tap the Windows key, type Disk Cleanup, and press Enter to launch it. You can even schedule a Disk Cleanup to clean your computer automatically.

When the tool pops up it will list a number of folders and system folders containing files that build up over time the more you use your PC.

Whilst this might be good in regards to browser cache when you are constantly going to the same sites over and over again as it means the photos and other files are locally stored on your computer preventing a network look up to download them again, these are files that you probably use once and forget about. This causes the folder size to rise and rise slowing down access. If you don't go to the sites often enough for a browser cache to be useful then clean it out. A tool like CCleaner can let you decide which sites get cleaned and which others don't.

Remember to regularly clean the following:
  • Your downloaded folder, apps, videos and other files that you have then installed or watched and no longer need.
  • Device Driver Downloads after installation.
  • Empty the Recycle Bin
  • Clean the System Error and Memory Dump Files
  • Delete Temporary Files 
  • Delete User File History

There are tools that are free that help you do all this, backing up your PC before the deletions in case something goes wrong. We will look at CCleaner in a bit.

So if you don't want to rely on costly tools that try and trick you into paying money to make you feel safe there are plenty of ways around it.

1. Don't be tricked by the salesperson at PC World who promises you McAfee Anti Virus software is the best way to protect your PC. It's insurance, and they get the money - a bonus to the sales person so to speak.

There is no need to waste money on a tool that will kill your CPU by constantly scanning every single file your computer accesses (which is a lot), when there are free tools like MalawareBytes Anti-Malware which can be downloaded for free online. There is a premium version if you do require constant analysis of every file your PC comes in contact with but I haven't found it to be needed.

Just run a scan once a week and make sure to never open .ZIP, .EXE, .DOCX or .PDF files in emails especially when you are not expecting them and they are from people you don't know.

Also please remember that is VERY EASY to fake the "FROM" address in an email (1 line of code), so if your a member of a site and someone sends you a flashy looking email that seems to be from PayPal, Facebook or your bank with the address admin@facebook.com do at least a few things before opening the file.

1. Open the full email headers so that you can see the original sender of the email. Is it from Facebook or your bank?

2. If you are not sure as it's an IP address e.g 134.1.34.248 then run that in a command prompt with the line >> nslookup 134.1.34.248 and make sure it returns a known address. If it comes back empty or with an unknown name e.g RuskiHCKER.com use an online Whois tool (there are lots online), or if you have installed WhoisCL on your Windows computer type whoisCL RuskiHCKER.com and see what the WHOIS details return about the owner of the address. It should tell you what country it's from and an email address to complain to if you are being spammed by it.

3. If the HTML email looks fancy like your bank or Facebook or some other site. Move your mouse over some of the bottom links in the footer or side bar. Most site strippers will only bother putting code behind the main buttons so they can log your typing e.g Login, Password, Forgot Password etc. If you roll your mouse over the "About" or "Help" links and all you see is a # instead of a proper URL then that is suspicious. Delete the email ASAP!

Remember banks never ask you for your PIN code so never trust a site asking you for that. Also if it asks you for information about your mothers maiden name, first pet, first school, favourite colour and other information used to verify you by sites you should shut it down ASAP.

4. If the headers look okay it could still be a hacked mailserver or a man in the middle attack so right click the file and if you installed Malaware properly you should be able to run a virus scan over the file with one click before saving or opening it. If you can't then save it to your computer and run a virus check on the file before opening it. Never just open the file whoever you may think it's from.

Regularly clear your browser history or even better, set your browser to automatically clear its history when you close it if you don’t want to store a history or even better just use the browsers secret browsing options e.g Chrome's is called Incognito and allows you to surf the web without leaving a history or storing cookies on your machine.

Also clear your browser cache every now and then. Whilst a cache is good for quick loading of images and files (JS, CSS, JPEGs) that are used often. Once it becomes too large then it gets slower and slower to find those files you need so it negates the usefulness of it due to it's size.

Run the Disk Defragmenter included with Windows. This isn't necessary if you use an SSD or solid-state drive.

Don’t bother with a registry cleaner or other performance tool if you have to pay for it. If you want an application to help you then CCleaner is that tool.

You can download from here: CCleaner, The good thing about it, is that it's the best-tested registry cleaner out there.

I always run a registry clean after removing applications from my computer to ensure any registry keys and file extensions left over are also removed. CCleaner will also delete your browser cache for all the browsers you use, as well as cookies, saved passwords, web history and temporary files for other programs.

You have the choice to tick what you want to clean and what not to clean but the free tool CCleaner does a lot more than many of these PC cleaning apps do. A test performed in 2011 by Windows Secrets found that the Disk Cleanup tool included with Windows was just as good as paid PC cleaning apps.

Note that this is true even though PC cleaning apps fix “registry errors” while the Disk Cleanup app doesn't, which just shows just how unnecessary registry cleaners are. So don't waste money being "blackmailed" into buying the premium version of these clean up tools.

So yes, it’s been tested, PC cleaning apps are worthless. Tune your PC yourself and you will get better results.

If you want to download CCleaner which is the recommended tool that professionals use then you can get it from here www.piriform.com/ccleaner/download.

By Strictly-Software.com 

© 2016 Strictly-Software.com

Sunday, 14 June 2015

The Wordpress Survival Guide - Part 2 - Performance

Surviving WordPress - Performance and Site Optimization


UPDATED - 14th Jun 2015

I have updated this to include a way to handle MySQL errors, a BASH script to tune Apache and an improved function to check your servers load and handle Windows errors. 

Plus code to disable the new WordPress HeartBeat functionality which can be a CPU / Bandwidth killer and a way to add CRON jobs to automate plugin functions without console access.

This is the second part of my guide to surviving WordPress and as promised it looks at performance tweaks and tips which I have gathered on my way.

It has been quite a while since the first instalment and the main reason for this was that I was suffering my own performance killer which I wanted to solve first before writing this article. Luckily this has now been solved with the help of Robert from Tiger Tech blog who helped me get to the bottom of the issue so here it is.

My own personal journey into WordPress performance tuning started off when I started to experience out of PHP memory errors when manually rebuilding my Google sitemap.

I started to play around with different plugins and then delve into the code which is when I started to realise the danger that WordPress plugins can carry out to a site when the user doesn't realise what's going on behind the scenes.

You can check out a detailed examination here but in my case it was using a Google Sitemap plugin that was set to rebuild when a new post was saved. Combining that with WP-O-Matic which imports articles at scheduled intervals and a TwitterBot such as my own which can send Tweets to multiple accounts whenever new content is added all added up to a performance killer!

If you have a similar setup it's worth running TOP, MyTOP and checking your access logs to see how it affects your own system but what was happening on my own setup was:

  • WP-O-Matic starts to import a feeds worth of articles (max of 10) for each article that is saved.
  • Numerous procedures hooked into the SavePost or PublishPost action run. In my case it was:
  1. My Strictly AutoTags plugin runs which analyses the article and adds relevant tags, depending on the admin settings, the number of tags and the length of the article this could be quick or slow.
  2. The Google Sitemap plugin then ran which runs a lot of SQL queries and creates a new file as well as pinging multiple SERPs with HTTP requests.
  3. My Strictly Tweetbot Plugin also runs which posts a tweet to multiple accounts. This caused a Twitter Rush as 50+ BOTS all hammered my site at the same time due to the new link appearing on Twitter. 
  4. Any other plugin using the Save hooks runs such as caching tools which create static files.
  • As soon as the Tweets arrive on Twitter a multitude of Bots, 50 on my last test, will visit the site to index the link that has just been posted OR try and scrape, hack or insert spam comments into the post.
  • If the link was posted to multiple accounts you will find that the same bots will visit for each account you posted to. Some bots like Yahoo seem to be particularly bad and visit the article multiple times anyway. So if I posted to 5 twitter accounts that's 250 visits in the space of a few seconds from BOTS scanning for new tweet links to visit!
  • All these visits create new Apache processes and depending on the amount of memory that each Apache process uses you could find that your server starts swapping memory to disk to handle the increase and in my case my server load would quickly jump from 0.15 to 50+.

The more articles you import the more iterations of this chain of performance killing events occurs. I found that these events would sometimes pass off without any noticeable problems but other times the server load would get so high that I would have to reboot my machine.

The highest value I recorded was 174 on a 1GB RAM Linux server!

In fact on some days I would have to reboot 3-5 times which is not good at all.

Getting to the bottom of the problem

A common solution to any performance related problem is to throw more resources at it. Many message boards recommended increasing the maximum memory limit to get round the Out of Memory errors the Google Sitemap was throwing up but that just masks the issue and doesn't actually solve it.

As a by product of my system tuning I ended up creating my own Google Sitemap Plugin to overcome limitations of the others.

Not only could it be easily set to rebuild at scheduled intervals instead of only when new posts were added which helps reduce unnecessary rebuilds, but it used far less memory and made a tiny number of database queries in comparison to the other market leaders.

I also created a System Reporting plugin so that I could be kept informed when my site was playing up and I found this invaluable in keeping my site running during this performance nightmare. If you are not on your site 24/7 and cannot afford professional monitoring services it is great to get an email telling you if you site is: down, taking ages to respond, has a very high server load or running too many SQL queries.

One of the first ideas to reduce the amount of times I was rebooting was to try and prevent any performance intensive tasks from running if the server load was already high.

I did this by adding in some checks to all my major plugins that made a call to the following function before running anything. If the load was above 1.0 I just exited immediately. You can read more about this method in this article: Testing Server Load.

function GetServerLoad(){

 $os = strtolower(PHP_OS); 
 
 // handle non windows machines
 if(substr(PHP_OS, 0, 3) !== 'WIN'){
  if(file_exists("/proc/loadavg")) {    
   $load = file_get_contents("/proc/loadavg"); 
   $load = explode(' ', $load);     
   return $load[0]; 
  }elseif(function_exists("shell_exec")) {     
   $load = @shell_exec("uptime");
   $load = explode(' ', $load);        
   return $load[count($load)-3]; 
  }else { 
   return false; 
  } 
 // handle windows servers
 }else{ 
  if(class_exists("COM")) {     
   $wmi  = new COM("WinMgmts:\\\\."); 
   if(is_object($wmi)){
    $cpus  = $wmi->InstancesOf("Win32_Processor"); 
    $cpuload = 0; 
    $i   = 0;   
    // Old PHP
    if(version_compare('4.50.0', PHP_VERSION) == 1) { 
     // PHP 4      
     while ($cpu = $cpus->Next()) { 
      $cpuload += $cpu->LoadPercentage; 
      $i++; 
     } 
    } else { 
     // PHP 5      
     foreach($cpus as $cpu) { 
      $cpuload += $cpu->LoadPercentage; 
      $i++; 
     } 
    } 
    $cpuload = round($cpuload / $i, 2); 
    return "$cpuload%"; 
   }
  } 
  return false;     
 } 
}


Apache Configuration

I finally got to the bottom of the problem I was suffering with the help of Tiger Tech after examining the output of ps auxwwH --sort rss during a period of high load. This listed all the currently running processes ordered by the amount of memory they were consuming.

At the time of running this my average load was 50 which meant there was a big queue of processes waiting to be run which included over 70 Apache processes each using between 8MB and 30MB and this alone was easily using up my 1GB of RAM.

This high number of Apache processes meant that my server was busily swapping from real memory to disk based virtual memory which was causing high I/O (clearly seen from the output of iostat) and slowing down the response times of each Apache process.

As each process got slower to respond new processes were spawned using up even more virtual memory adding to the problem. This spiral of death was only resolved if for some reason the traffic suddenly screeched to a halt (not likely during an article import that delivers hundreds of bots from Twitter on top of normal traffic) OR I killed Apache or the server.

The solution to this problem was to reduce the number of simultaneous Apache processes that could be run at one time by reducing the MaxClients setting in the Apache config file.

My existing setting of 256 was far too high for my 1GB RAM server. The way to calculate a more appropriate setting is to take the average size of an Apache process and then divide the total available memory by that number leaving room for other processes such as MySQL. In my case I was advised to set MaxClients to a value of 20 which seems small in comparison to the original value but makes more sense when you do the maths.

I have actually created a BASH script which you can run on your own server which will test the available space, average Apache process size, and then calculate the values for your MaxClients, MinSpareServers and MaxSpareServers which you can read here: BASH MaxClients Tuning Script.

Reducing my MaxClients setting to a much smaller value meant that the memory allocation for my system would never reach such unmanageable amounts again. If my server is swamped by traffic then instead of 256 Apache processes being spawned all trying to claim 20MB or more for themselves they will be queued up in an orderly fashion.

It might slow down some requests as they wait to be dealt with but that is far better than the whole server freezing up which was occurring regularly.

Two other settings I changed in the Apache conf file was the Timeout value down from 300 to 30 and HostnameLookups was turned off. You can read more about these settings at the Apache performance tuning site.

Another recent issue I have just had was eerily the opposite of the above. I would get periods of very low server load (0.00 - 0.02) and there would be no Apache or MySQL processes running. The websites couldn't be accessed and only a restart of Apache would fix it.

At first I was checking the Apache error logs and seeing lots of "MySQL Server has gone away" errors. I found that this was a common issue in WordPress and created a custom wp-db.php file which would re-connect to the server if a query ran and met that error. You can read more about that script here: Fixing the MySQL Server Has Gone Away Error.

However this just got rid of the error messages it didn't really fix any problems.

After a lot of reading and tuning I eventually found what "seems" to be a fix for this issue which may be caused by Apache processes hanging around for too long consuming up memory but not doing anything. I have edited the Apache conf file and changed KeepAliveTimeout value down from the current setting of 30 to 2 seconds.

I am debating on whether to turn it off altogether and then increase the MaxRequestsPerChild option. This website has some information about KeepAlive and whether you should turn it on or off.

Common WordPress Performance Tuning Tips

There are a number of common tips for performance tuning WordPress which you can read about in detail at other sites but I will quickly cover them here:

1. Install APC or another PHP caching system such as XCache or eAccelerator as these Opcode systems improve performance by saving and re-using compiled PHP which speeds up the execution of server side code.

2. Install a WordPress caching plugin such as WP Super Cache or W3 Total Cache. There is a debate over which one is best and whilst W3 Total Cache does offer more features such as Minification and Browser cache options the main issue that you want to resolve with WordPress is reducing the huge amount of database queries and code that is run on each page load. The aim is to do expensive tasks once and then re-use the results as many times as possible. Caching the results of database queries so that they don't have to be run every time the page loads is a great idea especially if the results hardly change and whilst W3 offers database query result caching as well as caching the output of the generated HTML Super Cache will only cache the generated output.

What is the difference? Well if you cached database query results then during the building of cached files the results of queries that are used to create category lists or tag clouds can be shared across builds rather than being recalculated for every page being cached that uses them. How much difference this makes when you take all MySQL's own internal query caching into consideration is debatable. However both plugins offer the major way to improve fast page loads which is disk based caching of the generated output incorporating GZIP compression.

If you do install W3 Total Cache and you have APC or another PHP Accelerator installed make sure that you enable the Disk Based Cache option for Page Caching and not Opcode which will be default selected if APC or XCache is installed.

3. If bandwidth is a problem then serving up minified and compressed HTML, CSS and JavaScript will help but you don't want to be repeatedly compressing files as they load. Some cache plugins will do this minification on the fly which hurts CPU whereas you really want it done once. There is nothing stopping you combining, compressing and minifying your files by hand. Then you will benefit from small files, fewer HTTP requests and less bandwidth whether or not you make use of a cache plugin.

4. Reduce 404 errors and ensure WordPress doesn't handle them as it will cane performance unnecessarily. Create a static 404 error page or ensure your cache system is setup to handle 404's. Also make sure that common files that cause 404's such as IPhone icons, Crossdomain.xml and favicons exist even if they are empty files.

5. If you're not planning on using a caching system then you should ensure that you tune your .htaccess file manually to ensure that browsers cache your files for specified periods of time rather than downloading them each time they visit your site. You also set your server to serve up compressed gzip files rather than letting a plugin do it for you.

You can do this by setting the future expire headers on your static content such as JS, CSS, images and so on like so:

<FilesMatch "(?i)^.*\.(ico|flv|ogg|swf|jpg|jpeg|png|gif|js|css)$">
ExpiresActive On
ExpiresDefault "access plus 1 weeks"
Header unset Last-Modified
Header set Cache-Control "public, no-transform"
SetOutputFilter DEFLATE
</FilesMatch>


6. Tune your MySQL database by ensuring that your database is set to cache query results and has enough space to do so wisely. Ensure options you don't use or require are disabled and make sure you regularly maintain your tables and indexes by keeping fragmentation to a minimum.

There are a couple of well known tuning scripts which can be used to aid in the setting of your MySQL configuration settings and which use your current database load and settings as a guide to offer recommendations.

http://github.com/rackerhacker/MySQLTuner-perl
http://hackmysql.com/mysqlreport











Uninstall Performance Hogging Plugins

There are lots of plugins available for WordPress and it can be like a case of a kid let lose in a candy shop as there seems to be at least 10 plugins for everything. However having too many plugins installed is definitely a bad thing in terms of performance and unless you know what the code is doing you could be shooting yourself in the foot by installing the next greatest plugin onto your site without thoroughly checking the source code out for yourself first.

The problem is that literally anyone can write and then publish a plugin on WordPress and many of these authors are not programmers by trade or have performance in the forefront of their minds as they develop the code that you might use.

Even plugins that are targeted as performance saving tools are not always beneficial and I have seen plugins that are designed to reduce bandwidth by returning 304 Not Modified headers or 403 Forbidden status codes but have to make numerous database queries, DNS lookups and carry out multiple regular expressions to do so. If Bandwidth is a problem then this might be worth the extra load but if it isn't then you are just swapping a small gain in one area for extra work somewhere else.

If you are going to use a plugin then take a look over the source code to see if you can help improve the performance by adding any missing indexes to any new tables the plugin might have added to your WordPress database. Many plugins do add tables especially if they need to store lots of data and many authors don't include the SQL statements to add appropriate indexes which could end up slowing down lookups down the road as the amount of data within the tables grows.

The following list are extra indexes I have added to tables within the WordPress database for both Plugins I installed and core WordPress tables that were missing indexes for certain queries. Remember WordPress is mainly a READ based system so the extra expense of adding indexes when data is inserted is usually worth it.


Plugin Table IndexName Columns IndexType
- wp_posts status_password_id post_status, post_password, ID Normal
- wp_posts post_date post_Date, ID Unique
fuzzySEOBooster wp_seoqueries_terms term_value_stid term_value, stid unique
fuzzySEOBoosterwp_seoqueries_data stid_pageid_pagetype_founded stid,page_id, page_type,founded unique
WP-O-Matic wp_wpo_campaign_post campaignid_feedid_hash `campaign_id, feed_id, hash Normal
Yet Another
Relatd Post
wp_yarpp_related_cache reference_id reference_ID, ID Normal

Ensure that you reguarly check the MySQL slow query log especially if you have just installed a new plugin as this will help you find queries that need optimising and potential bottlenecks caused by poorly thought out SQL.

On my own site I started off using a well known Related Posts plugin but I found out from the Slow log that the queries it ran to create the lists were killing performance due to their design.

They were taking 9-12 seconds to run and were scanning up to 25 million records at a time as well as carrying out unnecessary UNION statements which doubled the records it needed to look at. I ended up replacing it with a different plugin called LinkWithin which not only looked great due to the images it used but was perfect for performance because it was a JavaScript widget and all the work was carried out on their own server rather than mine.

This might not be the solution for you as obviously JavaScript is disabled by 10% of all visitors and bots won't be able to see the links.

If SEO is a concern, and it should be then you need to make sure that SERP crawlers find all your content easily and having a server side created list of related articles is a good idea for this reason alone. Therefore you can always create your own Related Posts section very easily with a function placed at the bottom of your articles that uses the categories assigned to the post to find other posts with the same category.

The following example shows one way in which this can be done and it makes use of a nice ORDER BY RAND() trick to ensure different articles and categories appear each time the SQL is run. It also uses Wordpresses inbuilt cache to store the results to prevent the query being executed too many times.

<?php
function get_my_related_posts($id, $limit){

// enable access to the WordPress DB object
global $wpdb;

// define SQL
$sql = "SELECT  CONCAT('http://www.mysite.com/',year(p.post_date),'/',RIGHT(concat('0' ,month(p.post_date)),2),'/',post_name,'/') as permalink,
p.post_title as title
FROM (
SELECT p.ID, p.post_name, p.post_title, p.post_date, terms.slug as category
FROM  wp_posts p,  wp_term_relationships tr,  wp_term_taxonomy tt,  wp_terms as terms
WHERE p.ID               != $id                 AND
p.post_type         = 'post'              AND
p.post_status       = 'publish'           AND
p.ID                = tr.object_id        AND
tr.term_taxonomy_id = tt.term_taxonomy_id AND
tt.taxonomy         in ( 'category')      AND
tt.term_id          = terms.term_id
GROUP BY  p.ID, p.post_title, p.post_name, p.post_date
ORDER BY terms.term_id
) as p,
(
SELECT distinct terms.slug
FROM wp_term_relationships tr, wp_term_taxonomy tt, wp_terms as terms
WHERE tr.object_id        = $id     AND
tr.term_taxonomy_id = tt.term_taxonomy_id AND
tt.taxonomy in ( 'category')    AND
tt.term_id          = terms.term_id
ORDER BY RAND() LIMIT 1
) as t
WHERE p.category = t.slug
ORDER BY  RAND()
LIMIT $limit";

// see if we have a cached recordset
$cache_name = "get_my_related_posts_" . $id;

$result = wp_cache_get( $cache_name );
if ( false == $result ) {

// get results and then cache for later use
$result = $wpdb->get_results( $sql );
wp_cache_set( $cache_name, $result );
}

// return result set as object
return $result;
}
?>
<div id="StrictlyRelatedPosts">
<h3>Related posts</h3>
<ul>
<?php
// fetch 5 related posts
$related_posts = get_related_posts($post->ID, 5);
// open loop
foreach ($related_posts as $related_post) {
$permalink = $related_post->permalink;
$title     = $related_post->title;
print "<li><a title=\"$title\" href=\"$permalink\">$title</a></li>\n";
} ?>
</ul>
</div>



Identifying Bottlenecks in Wordpress

One good plugin which I use for identifying potential problematic queries is the Debug Queries plugin which allows administrators to see all the queries that have run on each page. One extra tweak you should add is to put the following line in at the bottom of the get_fbDebugQueries function (around line 98)


$debugQueries .= ' ' . sprintf(__('» Memory Used %s'), $this->ConvertFromBytes($this->GetMemoryUsage(true))) . ' '. "\n";


Then add these two functions underneath that function (around line 106) which get the memory usage and format the value nicely.


// format size from bytes
function ConvertFromBytes($size){

 $unit=array('B','KB','MB','GB','TB','PB');

 return @round($size/pow(1024,($i=floor(log($size,1024)))),2).$unit[$i];
}

// get PHP memory usage
function GetMemoryUsage(){

 if(function_exists("memory_get_peak_usage")) {
  return memory_get_peak_usage(true);
 }elseif(function_exists("memory_get_usage")) {
  return  memory_get_usage(true);
 }
}


This will help you see just how many database queries a standard Wordpress page makes (88 on my homepage!) and if you haven't done any performance tuning then you may suddenly feel the urge before you suffer similar problems to those I experienced.

Remember a high performing site is one which attracts visitors and one which SERP bots are now paying more attention to when indexing. Therefore you should always aim to get the best performance out your system as is feasibly possible and as I have shown that doesn't mean spending a fortune on hardware.




Turning off WordPress features


If you ever look at your sites log file you might see that there is a lot of occurrences of requests to a page called wp-cron.php.

This is a page that handles internal scheduling by WordPress and many plugins hook into this to schedule tasks which is useful for people who don't have access to their webservers control panel as they can still set up "cron" jobs of a sort.

The only difference being that these cron jobs are fired when a page on the site is loaded and if you have a very quiet site a job you may want to run once every 5 minutes won't do if you don't get traffic every minute of the day.


POST /wp-cron.php?doing_wp_cron=1331142791

Sometimes you will even see multiple requests spawned (by your own servers IP) within the same second e.g

123.123.XXX.XX - - [07/Mar/2012:18:03:57 +0000] "POST /wp-cron.php?doing_wp_cron=1331143104 HTTP/1.0" 200 - "-" "WordPress/3.3.1; http://www.mysite.com"
123.123.XXX.XX - - [07/Mar/2012:18:03:57 +0000] "POST /wp-cron.php?doing_wp_cron=1331143109 HTTP/1.0" 200 - "-" "WordPress/3.3.1; http://www.mysite.com"
123.123.XXX.XX - - [07/Mar/2012:18:03:57 +0000] "POST /wp-cron.php?doing_wp_cron=1331143128 HTTP/1.0" 200 - "-" "WordPress/3.3.1; http://www.mysite.com"

To me this seems like overkill.

Yes the wp-cron job is needed to run internal Wordpress tasks such as posting scheduled posts or firing jobs that have been setup to use the internal cron system but having multiple requests fire at the same time seems unneccessary at best.

Why is this bad - well as this blog post about it says boltwebhosting.com says:

Wp-cron.php is called every time a page is loaded. That means if you are getting 50 visitors to your site every hour, and each of them reads 2-3 pages, then wp-cron.php is being called:
50 x 2.5 = 125 times per hour
125 x 24 = 3,000 times per day
3,000 x 30 = 90,000 times per month!
It does not just stop there, because unlike other features in WordPress, the wp-cron.php is spawned as an independent process which can sometimes take several minutes to complete its operations. So an active WordPress site with the traffic volume listed above is spawing 3,000 processes every day which do not really do anything.

Therefore on a very busy site you will be firing this page a lot of times and this may cause severe performance issues on it's own.

The solution is to replace this CRON job with a proper CRON job if possible.

To do this you either need access to your servers control panel or console but don't worry if you don't have access as you can still use a web based service like www.easycron.com.

As many hosting doesn't provide adequate Cron functions for their users this web based method is a great way of automating task without fiddling with your server.

If you do have the ability to setup a CRON task that fires the page once an hour or a time more appropriate to your needs then great. If you don't use the internal cron job for anything then the longer the gap the better but be careful as plugins may use it without your knowledge such as Database Backup plugins or Sitemap generator plugins. I set my CRON job to run the WP-CRON task every 10 minutes and this seems to be fine for my needs.

This is the format to use:

wget -U StrictlyCron -q -O /dev/null http://www.mysite.com/wp-cron.php?doing_wp_cron


You will notice that I am setting the -U parameter (user-agent) to StrictlyCron. This is because I block all blank useragent requests to my site with .htaccess rules (see the security article) and it also helps me identify my own requests in the log file.

Once you have done this you need to edit your sites wp-config.php file which will be in the root of your sites setup and add this line of code to the top of it.


/* disable WP-CRON from running all the time on every page load! */
define('DISABLE_WP_CRON', true);


As the comments state, this is disabling WordPress from firing it's own internal CRON job and as we have replaced it with a real CRON job that will run once an hour rather than on every page load it should reduce our traffic and server load considerably.



Turning Off WordPress HeartBeat

The WordPress HeartBeat functionality was introduced in WP 3.6 to allow interaction between the server and browser using AJAX. However like AutoSave and WP_CRON it can cause a lot of unnecessary HTTP requests as it defaults to 15 seconds a request.

The WordPress Heartbeat API allows WordPress to communicate between the web-browser and the server. It also improves session management, revision tracking, and auto saving. The WordPress Heartbeat API uses /wp-admin/admin-ajax.php, which allows WordPress to keep track of what's going on in the dashboard.

Unfortunately, this can also cause excessive requests to admin-ajax.php, leading to high CPU / Bandwidth usage. Whenever a web-browser is left open on a page using the Heartbeat API, this could potentially be an issue.

I have accidentally left open a post I was editing in a Chrome browser (that always re-opens pages that you had open when you close it) for a week and my bandwidth costs jumped by a good $30.

I scanned my log files and saw /wp-admin/admin-ajax.php being called every 15 seconds for the post page (seen in the Referer section of the log file).

Therefore I shut down the page ASAP and I added the following code to my functions.php file in my theme to only run the code on the post page as it's needed to delete custom fields, show tags and other features that make editing / adding posts easy.

To turn off the HeartBeat functionality go to your themes functions.php file and put the following code at the top of it.

If you don't want to turn it off but just change the timings from 15 seconds to a minute or something else you can but it relies on you editing a core compressed JavaScript WordPress file. You can read about how to do this here.

// stop heartbeat code
add_action( 'init', 'stop_heartbeat', 1 );

function stop_heartbeat() {
 global $pagenow;

        if ( $pagenow != 'post.php' && $pagenow != 'post-new.php' )
 {
  wp_deregister_script('heartbeat');
 }
}


Using WordPress Performance Enhanced Plugins

Now this isn't a sales pitch for my own plugins but you should try and avoid performance hogging plugins and use those with performance features built in.

For instance if your caching plugin has a "purge/delete cache" option then make sure it has a decent wait time inbetween each iteration of the loop otherwise it will consume all your CPU and Memory when you try deleting them. Ask the plugin author after reading their guide.

Also if you are using a Google Site Map on a busy site don't set it to build a whole new site map after every post. The server load may already be high and doing this will sent it higher. Also it is far better to just let the SERP crawlers crawl and find new content anyway.

However if you do want to use a sitemap then use one that lets you set the time the plugin is built at staged intervals through CRON or Web Cron jobs.

My old defunt Strictly Google Sitemap plugin which I no longer support but do use on all my sites because of it's unique features which include:

  • A low number of SQL queries run compared to other sitemap plugins, 
  • Fewer WordPress functions run that other plugins, 
  • The ability for the memory applied to the build process to automatically increment as required. So that you don't get out of memory errors as I used to with other well known plugins.


Even though some features are defunct it is still a great plugin to use for big sites needing sitemaps generated quickly.

With my plugin you can create sitemaps at set times and you can do all the stuff normal sitemap plugins do. The only bits that have stopped working are the SEO parts due to how Twitter, Google, BING and the others all work.

Also with my Strictly TweetBOT PRO plugin that allows you to post Tweets to as many accounts as you want (or the same account multiple times with different content), you might be interested in the delay functionality of the plugin.

It has a delay option where you you can set in seconds how long to wait after sending an HTTP GET request to your new post to get it into the cache before tweeting.

It also has an option to set a delay in seconds before each Tweet is sent out to the account. This allows for enough time for any previous Twitter Rushes to die down before creating a new one.

It also staggers the Tweets out so they don't all look like they are coming from the same place.

Buy Strictly TweetBOT PRO now


WordPress Performance Summary


  • Ensure Apache is configured correctly and don't leave the default values as they are. Make sure MaxClients is set correctly by dividing your RAM by the average Apache process size leaving room for MySQL and anything else you might be running.

  • Tune your MySQL database by configuring correctly and maintaining regularly. Use one of the many free tuning scripts to help set your configuration up correctly but ensure you read up about the various settings and what they do first.


  • Install a Caching plugin that creates hard copies of commonly requested files. Static HTML is fast to load. PHP is costly to compile. Use a PHP accelerator and ensure database query results are cached.


  • Reduce bandwidth by combining, compressing and minifying your CSS, JS and HTML. If your caching plugin doesn't do it once rather than on the fly do it by hand. Remember the key is to do expensive operations once and then re-use the results as many times as possible.


  • Set your .htaccess file up correctly. Ban bad bots to reduce traffic, set far future expiry headers on your static files and use static files to handle 404, 403, 503 errors etc.


  • Reduce the number of plugins and ensure any that you use are not hurting performance. Make sure any tables they use are covered by indexes and use the slow query log to identify problems.



  • Disable WordPress's internal CRON job and replace it with a real CRON job that runs once every hour or 30 minutes rather than on every page load.



  • Disable WordPress HeartBeat functionality or only allow it on post edits to prevent repeated HTTP calls if a page is left open in a browser. You can change the timings from 15 seconds to whatever you want but this means editing a compressed WordPress core JS file. 




Read Part 1 - An Overview
Read Part 3 - Security



Further Reading:






Thursday, 28 May 2015

Twitter Rush - The Rush just gets bigger and bigger!

Twitter Rush - The Rush Just Gets Bigger And Bigger!

By Strictly-Software

The amount of BOTs, social media sites and scrapers that hit your site after you post a Tweet with a link in it to a site to Twitter just gets bigger and bigger. When I first started recording the BOTS that hit my site after a post to Twitter it was about 15 now it has grown to over 100+!

You can read about my previous analysis of Twitter Rushes here and here however today I am posting the findings of a recent blog posting using my Strictly TweetBOT WordPress plugin to Twitter and the 108 HTTP Requests that followed in the following minutes after posting.

If you are not careful these Twitter Rushes could consume your web servers CPU and Memory as well as making a daisy chain of processes waiting to be completed that could cause high server loads and long connection / wait times for the pages to load.

You will notice that the first item in the list is a POST to the article.

That is because in the PRO version of my Stricty TweetBOT I have an option to send an HTTP request to the page before Tweeting. Then you can wait a few seconds (a setting you control), before any Tweets are sent out to ensure the plugin has enough time to cache the page.

This is so that if you have a Caching Plugin installed (e.g on WordPress WP Super Cache) or another system, the page is hopefully cached into memory or hand written as an HTML file to prevent any overload when the Twitter Rush comes.

It is always quicker to deliver a static HTML file to users than a dynamic PHP/.NET file that needs DB access etc.

So here are the results of today's test.

Notice how I return 403 status codes to many of the requests. 

This is because I block any bandwidth wasters that bring no benefit at all to my site.

The latest batch of these bandwidth wasters seem to be social media and brand awareness BOTS that want to see if their brand or site is mentioned in the article.

They are of no benefit to you at all and you should either block them using your firewall or with a 403 status code in your .htacces file.

Please also note the amount of duplicate requests from either the same IP address or the same company e.g TwitterBOT or Facebook that are made to the page. Why they do this I do not know!

The Recent Twitter Rush Test - 28-MAY-2015

XXX.XXX.XXX.XXX - - [28/May/2015:17:08:17 +0100] "POST /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/?r=12 HTTP/1.1" 200 22265 "-" "Mozilla/5.0 (http://www.strictly-software.com) Strictly TweetBot/1.1.2" 1/1582929
184.173.106.130 - - [28/May/2015:17:08:22 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "ShowyouBot (http://showyou.com/crawler)" 0/3372
199.16.156.124 - - [28/May/2015:17:08:21 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22377 "-" "Twitterbot/1.0" 1/1301263
199.16.156.125 - - [28/May/2015:17:08:21 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22375 "-" "Twitterbot/1.0" 1/1441183
185.20.4.220 - - [28/May/2015:17:08:21 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22377 "-" "Mozilla/5.0 (TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0" 1/1224266
17.142.151.49 - - [28/May/2015:17:08:21 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22375 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)" 1/1250324
151.252.28.203 - - [28/May/2015:17:08:22 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22374 "http://bit.ly/1eA4GYZ" "Go 1.1 package http" 1/1118106
46.236.26.102 - - [28/May/2015:17:08:23 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22376 "-" "Mozilla/5.0 (TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0" 0/833367
199.16.156.124 - - [28/May/2015:17:08:23 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22376 "-" "Twitterbot/1.0" 0/935200
142.4.216.19 - - [28/May/2015:17:08:24 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 - "-" "Mozilla/5.0 (compatible; OpenHoseBot/2.1; +http://www.openhose.org/bot.html)" 0/1964
17.142.152.131 - - [28/May/2015:17:08:24 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22375 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)" 0/875740
52.5.154.238 - - [28/May/2015:17:08:25 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22376 "-" "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31" 1/1029660
4.71.170.35 - - [28/May/2015:17:08:26 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate HTTP/1.1" 403 251 "-" "grokkit-crawler (pdsupport@purediscovery.com)" 0/1883
192.99.19.38 - - [28/May/2015:17:08:26 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 - "-" "Mozilla/5.0 (compatible; OpenHoseBot/2.1; +http://www.openhose.org/bot.html)" 0/1927
141.223.91.115 - - [28/May/2015:17:08:28 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "_bit=55673d4a-0024b-030a1-261cf10a;domain=.bit.ly;expires=Tue Nov 24 16:07:38 2015;path=/; HttpOnly" "-" 1/1592735
17.142.151.101 - - [28/May/2015:17:08:32 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22260 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)" 17/17210294
184.173.106.130 - - [28/May/2015:17:08:49 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "ShowyouBot (http://showyou.com/crawler)" 0/1870
142.4.216.19 - - [28/May/2015:17:08:49 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Mozilla/5.0 (compatible; OpenHoseBot/2.1; +http://www.openhose.org/bot.html)" 0/1601
52.6.187.68 - - [28/May/2015:17:08:28 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22262 "-" "Typhoeus - https://github.com/typhoeus/typhoeus" 20/20260090
45.33.89.102 - - [28/May/2015:17:08:28 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22262 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" 20/20370939
134.225.2.7 - - [28/May/2015:17:08:26 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22260 "-" "Mozilla/5.0 (TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0" 22/22337338
2a03:2880:1010:3ff4:face:b00c:0:8000 - - [28/May/2015:17:08:25 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22261 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 23/23973749
134.225.2.7 - - [28/May/2015:17:08:27 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Mozilla/5.0 (TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0" 21/21602431
54.167.214.223 - - [28/May/2015:17:08:25 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36" 24/24164062
4.71.170.35 - - [28/May/2015:17:08:51 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "grokkit-crawler (pdsupport@purediscovery.com)" 0/1688
192.99.19.38 - - [28/May/2015:17:08:51 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Mozilla/5.0 (compatible; OpenHoseBot/2.1; +http://www.openhose.org/bot.html)" 0/1594
54.246.137.243 - - [28/May/2015:17:08:51 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 - "-" "python-requests/1.2.3 CPython/2.7.6 Linux/3.13.0-44-generic" 0/1736
92.246.16.201 - - [28/May/2015:17:08:51 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0" 0/725424
4.71.170.35 - - [28/May/2015:17:08:55 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "grokkit-crawler (pdsupport@purediscovery.com)" 0/1808
2a03:2880:2130:9ff3:face:b00c:0:1 - - [28/May/2015:17:08:57 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate HTTP/1.1" 301 144 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 0/657830
54.198.122.232 - - [28/May/2015:17:08:51 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/534.24 (KHTML, like Gecko) (Contact: backend@getprismatic.com)" 7/7227418
2a03:2880:1010:3ff7:face:b00c:0:8000 - - [28/May/2015:17:08:51 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22255 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 7/7169003
54.198.122.232 - - [28/May/2015:17:08:51 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22257 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/534.24 (KHTML, like Gecko) (Contact: backend@getprismatic.com)" 7/7185701
2607:5300:60:3b37:: - - [28/May/2015:17:08:53 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0" 5/5298648
2a03:2880:2130:9ff7:face:b00c:0:1 - - [28/May/2015:17:08:56 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22267 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 1/1999466
178.32.216.193 - - [28/May/2015:17:08:49 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "http://bit.ly/1eA4GYZ" "LivelapBot/0.2 (http://site.livelap.com/crawler)" 9/9518327
199.59.148.209 - - [28/May/2015:17:08:58 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22260 "-" "Twitterbot/1.0" 1/1680322
54.178.210.226 - - [28/May/2015:17:08:58 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22257 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 1/1842148
54.198.122.232 - - [28/May/2015:17:08:58 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/534.24 (KHTML, like Gecko) (Contact: backend@getprismatic.com)" 1/1903731
54.198.122.232 - - [28/May/2015:17:09:00 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/534.24 (KHTML, like Gecko) (Contact: backend@getprismatic.com)" 1/1131792
2607:5300:60:3b37:: - - [28/May/2015:17:09:00 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22260 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0" 1/1048667
199.59.148.209 - - [28/May/2015:17:09:02 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "-" "Twitterbot/1.0" 1/1024583
54.178.210.226 - - [28/May/2015:17:09:02 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22260 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 1/1251088
65.52.240.20 - - [28/May/2015:17:09:03 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)" 0/814087
54.92.69.38 - - [28/May/2015:17:09:04 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/925457
54.92.69.38 - - [28/May/2015:17:09:05 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22266 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/932984
54.178.210.226 - - [28/May/2015:17:09:06 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/927202
54.178.210.226 - - [28/May/2015:17:09:08 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22260 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/717344
54.167.123.237 - - [28/May/2015:17:09:09 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)" 0/2286
54.178.210.226 - - [28/May/2015:17:09:12 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22254 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/971022
37.187.165.195 - - [28/May/2015:17:09:52 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "-" "Mozilla/5.0 (compatible; PaperLiBot/2.1; http://support.paper.li/entries/20023257-what-is-paper-li)" 0/688208
74.112.131.244 - - [28/May/2015:17:10:24 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22257 "-" "Mozilla/5.0 ()" 3/3572262
52.68.118.157 - - [28/May/2015:17:11:35 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/688056
52.68.118.157 - - [28/May/2015:17:11:35 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/719851
52.68.118.157 - - [28/May/2015:17:11:37 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22256 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/739706
52.68.118.157 - - [28/May/2015:17:11:38 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22258 "-" "Crowsnest/0.5 (+http://www.crowsnest.tv/)" 0/760912
74.6.254.121 - - [28/May/2015:17:12:05 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 0/248578
66.249.67.148 - - [28/May/2015:17:12:38 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22259 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 0/493468
54.145.93.204 - - [28/May/2015:17:13:25 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "jack" 0/1495
54.145.93.204 - - [28/May/2015:17:13:26 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22257 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2008091620 Firefox/3.0.2" 0/597310
178.33.236.214 - - [28/May/2015:17:13:41 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Mozilla/5.0 (compatible; Kraken/0.1; http://linkfluence.net/; bot@linkfluence.net)" 0/2065
173.203.107.206 - - [28/May/2015:17:13:50 +0100] "POST /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/?r=12 HTTP/1.1" 200 22194 "-" "Mozilla/5.0 (http://www.strictly-software.com) Strictly TweetBot/1.1.2" 4/4801717
184.173.106.130 - - [28/May/2015:17:13:58 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "ShowyouBot (http://showyou.com/crawler)" 0/96829
178.32.216.193 - - [28/May/2015:17:13:57 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22303 "http://bit.ly/1eA4GYZ" "LivelapBot/0.2 (http://site.livelap.com/crawler)" 1/1032211
192.99.1.145 - - [28/May/2015:17:13:59 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22303 "http://bit.ly/1eA4GYZ" "LivelapBot/0.2 (http://site.livelap.com/crawler)" 1/1535270
184.173.106.130 - - [28/May/2015:17:14:02 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "ShowyouBot (http://showyou.com/crawler)" 0/1764
52.6.187.68 - - [28/May/2015:17:14:01 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22293 "-" "Typhoeus - https://github.com/typhoeus/typhoeus" 66/66512611
146.148.22.255 - - [28/May/2015:17:15:10 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22292 "-" "Mozilla/5.0 (compatible; Climatebot/1.0; +http://climate.k39.us/bot.html)" 0/885387
74.6.254.121 - - [28/May/2015:17:15:11 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 1/1789256
146.148.22.255 - - [28/May/2015:17:15:17 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22290 "-" "Mozilla/5.0 (compatible; Climatebot/1.0; +http://climate.k39.us/bot.html)" 1/1275245
54.162.7.197 - - [28/May/2015:17:15:18 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22291 "http://bit.ly/1eA4GYZ" "Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.107 Safari/534.13 v1432829642.1352" 0/711142
146.148.22.255 - - [28/May/2015:17:15:24 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22293 "-" "Mozilla/5.0 (compatible; Climatebot/1.0; +http://climate.k39.us/bot.html)" 0/742404
54.162.7.197 - - [28/May/2015:17:15:32 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22289 "-" "msnbot/2.0b v1432829684.8617" 0/717679
23.96.208.137 - - [28/May/2015:17:16:05 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22294 "-" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)" 0/560954
69.164.211.40 - - [28/May/2015:17:17:38 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22293 "-" "Mozilla/5.0 (compatible; EveryoneSocialBot/1.0; support@everyonesocial.com http://everyonesocial.com/)" 0/516967
96.126.110.221 - - [28/May/2015:17:18:24 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22300 "-" "Mozilla/5.0 (compatible; EveryoneSocialBot/1.0; support@everyonesocial.com http://everyonesocial.com/)" 0/464585
69.164.217.210 - - [28/May/2015:17:18:42 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22288 "-" "Mozilla/5.0 (compatible; EveryoneSocialBot/1.0; support@everyonesocial.com http://everyonesocial.com/)" 0/482230
173.255.232.252 - - [28/May/2015:17:19:03 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22293 "-" "Mozilla/5.0 (compatible; EveryoneSocialBot/1.0; support@everyonesocial.com http://everyonesocial.com/)" 0/514587
173.255.232.252 - - [28/May/2015:17:19:12 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22288 "-" "Mozilla/5.0 (compatible; EveryoneSocialBot/1.0; support@everyonesocial.com http://everyonesocial.com/)" 0/858459
96.126.110.222 - - [28/May/2015:17:19:26 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22288 "-" "Mozilla/5.0 (compatible; EveryoneSocialBot/1.0; support@everyonesocial.com http://everyonesocial.com/)" 0/469048
92.222.100.96 - - [28/May/2015:17:19:28 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22291 "-" "sfFeedReader/0.9" 0/574409
54.176.17.88 - - [28/May/2015:17:20:20 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+DarkPolitricks+%28Dark+Politricks%29 HTTP/1.1" 200 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36" 1/1112283
184.106.123.180 - - [28/May/2015:17:20:56 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22285 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" 0/522039
74.6.254.121 - - [28/May/2015:17:22:35 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 0/260972
54.163.57.132 - - [28/May/2015:17:23:05 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Ruby" 0/2749
54.163.57.132 - - [28/May/2015:17:23:07 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Ruby" 0/1647
54.163.57.132 - - [28/May/2015:17:23:09 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Ruby" 0/1487
178.33.236.214 - - [28/May/2015:17:23:14 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+DarkPolitricks+%28Dark+Politricks%29 HTTP/1.1" 403 252 "-" "Mozilla/5.0 (compatible; Kraken/0.1; http://linkfluence.net/; bot@linkfluence.net)" 0/1996
168.63.10.14 - - [28/May/2015:17:23:23 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 - "-" "Apache-HttpClient/4.1.2 (java 1.5)" 0/1602
168.63.10.14 - - [28/May/2015:17:23:23 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Apache-HttpClient/4.1.2 (java 1.5)" 0/1486
74.6.254.121 - - [28/May/2015:17:24:05 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 0/260635
45.33.35.236 - - [28/May/2015:17:24:59 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22284 "-" "Mozilla/5.0 ( compatible ; Veooz/1.0 ; +http://www.veooz.com/veoozbot.html )" 0/618370
74.6.254.121 - - [28/May/2015:17:25:35 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 0/255700
70.39.246.37 - - [28/May/2015:17:26:10 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22283 "-" "Mozilla/5.0 Moreover/5.1 (+http://www.moreover.com; webmaster@moreover.com)" 0/469127
82.25.13.46 - - [28/May/2015:17:28:55 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22285 "https://www.facebook.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36" 0/568199
157.55.39.84 - - [28/May/2015:17:29:17 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22284 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 0/478186
188.138.124.201 - - [28/May/2015:17:30:10 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22284 "-" "ADmantX Platform Semantic Analyzer - ADmantX Inc. - www.admantx.com - support@admantx.com" 2/2500606
54.204.149.66 - - [28/May/2015:17:30:30 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22285 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0 (NetShelter ContentScan, contact abuse@inpwrd.com for information)" 0/680643
54.198.122.232 - - [28/May/2015:17:30:31 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+DarkPolitricks+%28Dark+Politricks%29 HTTP/1.1" 200 22220 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/534.24 (KHTML, like Gecko) (Contact: backend@getprismatic.com)" 0/650482
64.49.241.208 - - [28/May/2015:17:30:32 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22284 "-" "ScooperBot www.customscoop.com" 0/658243
50.16.81.18 - - [28/May/2015:17:30:46 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22283 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0 (NetShelter ContentScan, contact abuse@inpwrd.com for information)" 0/673211
54.166.112.98 - - [28/May/2015:17:30:47 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A%20DarkPolitricks%20%28Dark%20Politricks%29 HTTP/1.1" 403 252 "-" "Recorded Future" 0/1645
54.80.130.191 - - [28/May/2015:17:30:59 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 403 252 "-" "Recorded Future" 0/2777
74.6.254.121 - - [28/May/2015:17:31:35 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22282 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 0/606530
74.6.254.121 - - [28/May/2015:17:33:05 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22283 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 0/490218
54.208.89.59 - - [28/May/2015:17:34:31 +0100] "HEAD /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 - "-" "-" 0/309191
54.208.89.59 - - [28/May/2015:17:34:32 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22281 "-" "-" 0/647914
74.6.254.121 - - [28/May/2015:17:34:35 +0100] "GET /2015/05/study-finds-severe-cold-snap-during-the-geological-age-known-for-its-extreme-greenhouse-climate/ HTTP/1.1" 200 22284 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)" 0/555798

So the moral of the story is this:

  • Be careful when you post to Twitter as you will get a rush of traffic to your site in the following minutes which could cause your site problems.
  • Try and block social media / brand awareness / spam BOTS if you can so they don't consume your bandwidth and CPU/Memory/
  • Use either your servers firewall, or .htaccess file to block BOTS you consider a waste of your money. Remember any HTTP request to your site when you are using a VPS will cost you money. Why waste it on BOTS that provide no benefit to you.
  • Try and mitigate the rush by using the Crawl-Delay Robots.txt command to stop the big SERP BOTS from hammering you straight away.

I am sure I will post another Twitter Rush analysis in the coming months and the number of BOTS will have grown from the initial 15+ or so when I first tested it to 200+!