I came across this interesting article the other day over at cloudtesting.com which listed the user-agents of bots that would visit a link if it was posted on Twitter. I have copied the list below and it's quite amazing to think that as soon as you post a link to your site or blog on Twitter you will suddenly get hammered by X amount of bots.
I can definitely attest to the truthfulness of this behaviour as I am experiencing a similar problem with one of my LAMP Wordpress blogs. Whenever an article is posted I automatically post tweets to 2 (sometimes 3 depending on relevance) Twitter accounts with my new Strictly Tweetbot Wordpress plugin.
Therefore when I import content at scheduled intervals throughout the day I can receive quite a sudden rush of bot traffic to my site which spikes my server load often to levels that are unrecoverable.
- Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
- Mozilla/5.0 (compatible; abby/1.0; +http://www.ellerdale.com/crawler.html)
- Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot
- Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
- Mozilla/5.0 (compatible; Feedtrace-bot/0.2; email@example.com)
- Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)
- User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:126.96.36.199) Gecko/2009021910 Firefox/3.0.7 (.NET CLR 3.5.30729)
- Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 OneRiot/1.0 (http://www.oneriot.com)
- PostRank/2.0 (postrank.com)
- Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0 Me.dium/1.0 (http://me.dium.com)
- Mozilla/5.0 (compatible; VideoSurf_bot +http://www.videosurf.com/bot.html)
- Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:188.8.131.52) Gecko/2008092417 Firefox/3.0.3
- Mozilla/5.0 (compatible; page-store) [email:paul at page-store.com]
Personally I think that this list might be out of date as from what I have seen there are quite a few more agents that can be added to that list including bots from Tweetme and Bit.ly.
Currently if I think the bots don't provide any kind of benefit to me in terms of traffic apart from stealing my bandwidth and killing my server I am serving 403's using htaccess rules.
Before banning an agent check your log files or stats to see if you can see any traffic being referred. If you want the benefit but without the constant hitting try contacting the company behind the bot to see if they could change their behaviour. You never know they may be relying on your content and be willing to tweaking their code. We can all live in hope.