Wednesday 29 September 2010

Analysing Bot Traffic from a Twitter Rush

Twitter Rush - Bot Traffic from Twitter

I blogged the other day about a link I found that listed the traffic that visits a site whenever a link to that site is posted upon twitter.

It seems that if you post a Tweet that contains a link a Twitter Rush is caused due to numerous BOTS, Social Media sites, SERPS and other sites noticing the new link and then all visiting your site at the same time.

This is one reason I created Strictly TweetBOT PRO which differs from my free version of Strictly TweetBOT as it allows you to do the following:

  • Make an HTTP request to the new post before Tweeting anything. If you have a caching plugin on your site then this should put the new post into the cache so that when the Twitter Rush comes they all hit a cached page and not a dynamically created one.
  • Add a query-string to the URL of the new post when making an HTTP request to aid caching. Some plugins like WP Super Cache allow you to force an un-cached page to be loaded with a query-string. So this will enable the new page to be loaded and re-cached.
  • Delay tweeting for N seconds after making the HTTP request to cache your post. This will help you ensure that the post is in the cache before the Twitter Rush.
  • Add a delay between each Tweet that is sent out. If you are tweeting to multiple accounts you will cause multiple Twitter Rushes. Therefore staggering the hits aids performance.

Buy Now
I have been carrying out performance tests on one of my LAMP sites and have been analysing this sort of data in some depth. I thought I would post an update with the actual traffic my own site receives when a link is Tweeted which is below.

A few interesting points:

1. This traffic is instantaneous so that the first item in the log file visiting the site has exactly the same time stamp as the WordPress URL that submitted the tweets to my Twitter account.

2. Yahoo seems to duplicate requests. This one tweet to a single post resulted in 3 requests for Yahoo's Slurp BOT but they originated from two different IP addresses.

3. These bots are not very clever and don't seem to log the URL's they visit to prevent duplicate requests. Not only does Yahoo have issues with the same account but if you post the same link to multiple Twitter accounts you will get all this traffic for each account.

For example when I posted the same link to 3 different Twitter accounts I received 57 requests (19 * 3). You would think maybe these Bots would be clever enough to realise that they only need to visit a link once every so often no matter which account posted it.

It just serves to prove my theory that most Twitter traffic is BOT related. 

BOTS following BOTS and Re-Tweeting and following traffic generated by other BOTS.

  • - - [29/Sep/2010:21:06:45 +0000] "HEAD /2010/09/capital-punishment-and-law-and-order/ HTTP/1.1" 200 - "-" "Twitterbot/0.1"
  • - - [29/Sep/2010:21:06:47 +0000] "GET /2010/09/capital-punishment-and-law-and-order/ HTTP/1.0" 200 26644 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 ("
  • - - [29/Sep/2010:21:06:48 +0000] "HEAD /2010/09/capital-punishment-and-law-and-order/ HTTP/1.1" 200 - "-" "PostRank/2.0 ("
  • - - [29/Sep/2010:21:06:46 +0000] "GET /2010/09/capital-punishment-and-law-and-order/ HTTP/1.0" 200 100253 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;"
  • - - [29/Sep/2010:21:06:47 +0000] "GET /2010/09/capital-punishment-and-law-and-order/ HTTP/1.0" 200 100253 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;"
  • - - [29/Sep/2010:21:06:49 +0000] "GET /2010/09/capital-punishment-and-law-and-order/ HTTP/1.1" 200 26643 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
  • - - [29/Sep/2010:21:06:49 +0000] "GET /2010/09/capital-punishment-and-law-and-order/ HTTP/1.1" 200 26634 "-" "Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot"
  • - - [29/Sep/2010:21:06:49 +0000] "GET /2010/09/capital-punishment-and-law-and-order/ HTTP/1.0" 200 100253 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;"
  • - - [29/Sep/2010:21:06:49 +0000] "GET /2010/09/capital-punishment-and-law-and-order/ HTTP/1.1" 200 100253 "-" "Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot"
  • - - [29/Sep/2010:21:06:54 +0000] "GET /2010/09/capital-punishment-and-law-and-order/ HTTP/1.1" 200 26634 "-" "kame-rt ("
  • - - [29/Sep/2010:21:06:57 +0000] "GET /2010/09/capital-punishment-and-law-and-order/ HTTP/1.1" 200 100253 "-" "Voyager/1.0"
  • - - [29/Sep/2010:21:07:03 +0000] "GET /2010/09/capital-punishment-and-law-and-order/ HTTP/1.0" 200 100253 "-" "Mozilla/5.0 (compatible; Butterfly/1.0; + Gecko/2009032608 Firefox/3.0.8"
  • - - [29/Sep/2010:21:07:10 +0000] "GET /2010/09/capital-punishment-and-law-and-order/ HTTP/1.1" 200 26640 "-" "AppEngine-Google; (+; appid: mapthislink)"
  • - - [29/Sep/2010:21:07:17 +0000] "HEAD /2010/09/capital-punishment-and-law-and-order/ HTTP/1.1" 200 - "" "LongURL API"
  • - - [29/Sep/2010:21:07:17 +0000] "GET /2010/09/capital-punishment-and-law-and-order/ HTTP/1.1" 200 100253 "" "LongURL API"
  • - - [29/Sep/2010:21:07:25 +0000] "GET /2010/09/capital-punishment-and-law-and-order/ HTTP/1.1" 200 26653 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +"
  • - - [29/Sep/2010:21:07:32 +0000] "GET /2010/09/capital-punishment-and-law-and-order/ HTTP/1.1" 403 455 "-" "Jakarta Commons-HttpClient/3.1"
  • - - [29/Sep/2010:21:08:55 +0000] "HEAD /2010/09/capital-punishment-and-law-and-order/ HTTP/1.1" 403 - "-" "Firefox"
  • - - [29/Sep/2010:17:01:06 +0000] "GET /2010/09/capital-punishment-and-law-and-order/ HTTP/1.1" 403 473 "-" "Mozilla/5.0 (compatible; mxbot/1.0; +"

Buy Now

1 comment:

Nathaniel said...

Great post. I had a feeling it was bots when I set up a second site that automatically tweets blog posts and other events. Tonight after posting a blog entry and one other event I instantly had 39 visitors and the twitter account doesn't even have 20 followers yet.