Wednesday 12 November 2014

Twitter Rush caused by Tweet BOTs visiting your site

Twitter Traffic can cause a major hit on your server

If you are using Twitter to post tweets to whenever you blog or post an article you should know that a large number of BOTS will immediately hit your site as soon as they see the link.

This is what I call a Twitter Rush as you are causing a rush of traffic from posting a link on Twitter.

I did a test some months back and I like to regularly test how many hits I get whenever I post a link so that I can weed out the chaff from the wheat and set up rules to ban any BOTS I think are just wasting me money.

Most of these BOTS are also stupid.

If you post the same link to multiple Twitter accounts e.g by using my Wordpress plugin - Strictly Tweetbot then the same BOT will come to the same page multiple times.

Why? Who wrote such crap code and why don't they check before hitting a site that they haven't just crawled that link. I cannot believe the developers at Yahoo cannot write a BOT that works out they have just crawled a page before doing it two more times.

Some BOTS are obviously needed such as the major SERP search engines e.g Google or Bing but many are start up "social media" search engines and other such content scrapers, spammers. hackbots and bandwidth wasters.

Because of this I now 403 a lot of these BOTS or even send them back to the IP address they came from with an ISAPI rewrite rule as they don't provide me with any benefit and just steal bandwidth and cost me money.

RewriteCond %{HTTP_USER_AGENT} (?:Spider|MJ12bot|seomax|atomic|collect|e?mail|magnet|reaper|tools\.ua\.random|siphon|sweeper|harvest|(?:microsoft\surl\scontrol)|wolf) [NC]
RewriteRule .* http://%{REMOTE_ADDR} [L,R=301]

However if you are using my Strictly Tweet BOT plugin which can post multiple tweets to the same or multiple accounts then the new version allows you to pre-post a page which hopefully gets cached by the caching plugin you should be using (WP SUPER CACHE or W3 TOTAL CACHE etc) before the article is made public and the BOT looking at Twitter for URL's to scrape can get to it.

The aim is to get the page cached BEFORE multiple occurrences of BOTS hit the page. If the page is already cached then the load on your server should be a lot less than if every BOT's loading of the page was trying to cache the page at the same time (due to the quickness of their visit). 

However if you are auto blogging and using my TweetBOT you might be interested in Strictly TweetBOT PRO as it had extra features for people who are tweeting to multiple accounts or multiple tweet in different formats to the same account. These new features are all designed to reduce the hit from a Twitter Rush.

The paid for version allows you to do the following:

  • Make an HTTP request to the new post before Tweeting anything. If you have a caching plugin on your site then this should put the new post into the cache so that when the Twitter Rush comes they all hit a cached page and not a dynamically created one.
  • Add a query-string to the URL of the new post when making an HTTP request to aid caching. Some plugins like WP Super Cache allow you to force an uncached page to be loaded with a querystring. So this will enable the new page to be loaded and re-cached.
  • Delay tweeting for N seconds after making the HTTP request to cache your post. This will help you ensure that the post is in the cache before the Twitter Rush.
  • Add a delay between each Tweet that is sent out. If you are tweeting to multiple accounts you will cause multiple Twitter Rushes. Therefore staggering the hits aids performance.


Buy Now


I did a test this morning to see how much traffic was generated by a test post. I got almost 50 responses within 2 seconds!

50.18.132.28 - - [30/Nov/2011:07:04:41 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 471 "-" "bitlybot"
50.57.137.74 - - [30/Nov/2011:07:04:43 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "EventMachine HttpClient"
50.57.137.74 - - [30/Nov/2011:07:04:43 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "EventMachine HttpClient"
184.72.47.46 - - [30/Nov/2011:07:04:43 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "UnwindFetchor/1.0 (+http://www.gnip.com/)"
204.236.150.14 - - [30/Nov/2011:07:04:44 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 403 471 "-" "JS-Kit URL Resolver, http://js-kit.com/"
50.18.121.55 - - [30/Nov/2011:07:04:45 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "UnwindFetchor/1.0 (+http://www.gnip.com/)"
184.72.47.71 - - [30/Nov/2011:07:04:47 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "UnwindFetchor/1.0 (+http://www.gnip.com/)"
50.18.121.55 - - [30/Nov/2011:07:04:48 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "UnwindFetchor/1.0 (+http://www.gnip.com/)"
184.72.47.71 - - [30/Nov/2011:07:05:11 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "UnwindFetchor/1.0 (+http://www.gnip.com/)"
199.59.149.31 - - [30/Nov/2011:07:04:43 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "Twitterbot/0.1"
107.20.160.159 - - [30/Nov/2011:07:04:43 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "http://unshort.me/about.html"
46.20.47.43 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 403 369 "-" "Mozilla/5.0 (compatible"
199.59.149.165 - - [30/Nov/2011:07:04:43 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28862 "-" "Twitterbot/1.0"
173.192.79.101 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 403 471 "-" "-"
50.18.121.55 - - [30/Nov/2011:07:05:11 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "UnwindFetchor/1.0 (+http://www.gnip.com/)"
46.20.47.43 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 403 369 "-" "Mozilla/5.0 (compatible"
199.59.149.31 - - [30/Nov/2011:07:05:11 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "Twitterbot/0.1"
65.52.0.229 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28863 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
66.228.54.132 - - [30/Nov/2011:07:04:43 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 106183 "-" "InAGist URL Resolver (http://inagist.com)"
199.59.149.165 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28863 "-" "Twitterbot/1.0"
107.20.42.241 - - [30/Nov/2011:07:05:11 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "PostRank/2.0 (postrank.com)"
65.52.0.229 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28862 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
65.52.0.229 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28862 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
110.169.128.180 - - [30/Nov/2011:07:05:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28862 "http://twitter.com/" "User-Agent:Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-us) AppleWebKit/533.18.1"
74.112.131.128 - - [30/Nov/2011:07:05:15 +0000] "GET /some-test-url-I-posted/ HTTP/1.0" 200 106203 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8"
74.112.131.131 - - [30/Nov/2011:07:05:16 +0000] "GET /some-test-url-I-posted/ HTTP/1.0" 200 106203 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8"
74.112.131.127 - - [30/Nov/2011:07:05:16 +0000] "GET /some-test-url-I-posted/ HTTP/1.0" 200 106203 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8"
74.112.131.128 - - [30/Nov/2011:07:05:16 +0000] "GET /some-test-url-I-posted/ HTTP/1.0" 200 106203 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8"
74.112.131.131 - - [30/Nov/2011:07:05:17 +0000] "GET /some-test-url-I-posted/ HTTP/1.0" 200 106203 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8"
74.112.131.128 - - [30/Nov/2011:07:05:17 +0000] "GET /some-test-url-I-posted/ HTTP/1.0" 200 106203 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8"
107.20.78.114 - - [30/Nov/2011:07:05:18 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
65.52.54.253 - - [30/Nov/2011:07:05:19 +0000] "GET /2011/11/hypocrisy-rush-drug-test-welfare-benefit-recipients HTTP/1.1" 403 470 "-" "-"
107.20.78.114 - - [30/Nov/2011:07:05:28 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
65.52.62.87 - - [30/Nov/2011:07:05:30 +0000] "GET /2011/11/hypocrisy-rush-drug-test-welfare-benefit-recipients HTTP/1.1" 403 470 "-" "-"
107.20.78.114 - - [30/Nov/2011:07:05:43 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
107.20.78.114 - - [30/Nov/2011:07:06:15 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
199.59.149.165 - - [30/Nov/2011:07:04:45 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28860 "-" "Twitterbot/1.0"
107.20.160.159 - - [30/Nov/2011:07:04:59 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "http://unshort.me/about.html"
107.20.78.114 - - [30/Nov/2011:07:06:15 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
199.59.149.31 - - [30/Nov/2011:07:04:46 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "Twitterbot/0.1"
107.20.42.241 - - [30/Nov/2011:07:05:01 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "PostRank/2.0 (postrank.com)"
107.20.42.241 - - [30/Nov/2011:07:05:07 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "PostRank/2.0 (postrank.com)"
107.20.78.114 - - [30/Nov/2011:07:06:15 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
107.20.78.114 - - [30/Nov/2011:07:06:17 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
107.20.78.114 - - [30/Nov/2011:07:06:17 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 403 - "-" "MetaURI API/2.0 +metauri.com"
50.16.51.20 - - [30/Nov/2011:07:06:21 +0000] "HEAD /some-test-url-I-posted/ HTTP/1.1" 200 - "-" "Summify (Summify/1.0.1; +http://summify.com)"
74.97.60.113 - - [30/Nov/2011:07:07:11 +0000] "GET /some-test-url-I-posted/ HTTP/1.1" 200 28889 "-" "Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20100101 Firefox/8.0"



Buy Now

Tuesday 11 November 2014

Turn off WordPress HeartBeat to reduce bandwidth and CPU

Turn off WordPress HeartBeat to reduce bandwidth and CPU

By Strictly-Software

I recently noticed a spike in bandwidth and costs on my Rackspace server. The cost had jumped up a good $30 from normal months.

Now I am still in the process of finding out why this has happened but one thing I did come across was a lot of calls to a script called /wp-admin/admin-ajax.php which was happening every 15 seconds.

Now this is the sign of WordPress's HeartBeat functionality which allows the server and browser to communicate and I quote from the inmotionhosting.com website, HeartBeat
allows WordPress to communicate between the web-browser and the server. It allows for improved user session management, revision tracking, and auto saving.
The WordPress Heartbeat API uses /wp-admin/admin-ajax.php to run AJAX calls from the web-browser. Which in theory sounds awesome, as WordPress can keep track of what's going on in the dashboard.
However this can also start sending excessive requests to admin-ajax.php which can lead to high CPU usage. Anytime a web-browser is left open on a page using the Heartbeat API, this could potentially be an issue.
Therefore I scanned my log files and found that my own server IP was making calls to a page every 15 seconds e.g

62.21.14.247 - - [11/Nov/2014:15:00:20 +0000] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 98 "http://www.mysite.com/wp-admin/post.php?post=28968&action=edit&message=1" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36" 0/799585
62.21.14.247 - - [11/Nov/2014:15:00:35 +0000] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 98 "http://www.mysite.com/wp-admin/post.php?post=28968&action=edit&message=1" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36" 25/25540888

I checked my browser (Chrome) which I do leave lots of windows open for ages, multi tasking :) , and I found that I had left open a post edit window in WordPress. This was causing the HeartBeat to call the script every 15 seconds.

Now I don't know that this is the ONLY reason for my increase in Bandwidth and obviously CPU due to all the HTTP requests but I am guessing it made up a big part of it.

Therefore I decided to turn off the HeartBeat functionality.

I have already disabled auto saving and revisions as I don't need that functionality so I am going to see what happens - hopefully my costs will go down!

Turning Off WordPress HeartBeat

To turn off the HeartBeat functionality go to your themes functions.php file and put the following code at the top of it.


// stop heartbeat code
add_action( 'init', 'stop_heartbeat', 1 );

function stop_heartbeat() {
        wp_deregister_script('heartbeat');

}


So I will see what happens with this turned off. So far not a lot.

However if you do notice a spike in your Bandwidth or CPU and you use WordPress check you haven't left a page open in your browser that would be causing the HeartBeat function to call the /wp-admin/admin-ajax.php every 15 seconds!