Wednesday 9 June 2010

SQL Injection attack from Googlebot

SQL Injection Hack By Googlebot Proxy

Earlier today on entering work I was faced with worried colleagues and angry customers who were complaining about Googlebot being banned from their site. I was tasked to finding out why.

First off all my large systems run with a custom built logger database that I created to help track visitors, page requests, traffic trends etc.

It also has a number of security features that constantly analyse recent traffic looking for signs of malicious intent such as spammers, scrapers and hackers.

If my system identifies a hacker it logs the details and bans the user. If a user comes to my site and its already in my banned table then it's met with a 403 error.

Today I found out that Googlebot had been hacking my site using known SQL Injection techniques.

The IP address was a legitimate Google IP coming from the 66.249 subnet and there were 20 or so records from one site in which SQL injection attack vectors had been passed in the querystring.

Why this has happened I do not know as an examination of the page in question found no trace of the logged links however I can think of a theoretical example which may explain it.

1. A malicious user has either created a page containing links to my site that contain SQL Injection attack vectors or has added content through a blog, message board or other form of user generated CMS that has not sanitised the input correctly.

2. This content has then been indexed by Google or even just appeared in a sitemap somewhere.

3. Googlebot has visited this content and crawled it following the links containing the attack vectors which have then been logged by site.

This "attack by SERP proxy" has left no trace of the actual attacker and the trail only leads back to Google who I cannot believe tried to hack me on purpose.

Therefore this is a very clever little trick as websites are rarely inclined to block the worlds foremost search engine from their site.

Therefore I was faced with the difficult choice of either adding this IP to my exception list of users never to block under any circumstance or blocking it from my site.

Obviously my sites database is secure and it's security policy is such that even if a hackbot found an exploitable hole updates couldn't be carried out by the websites login however this does not mean that in future an XSS attack vector could be created and then exploited.

Do I risk the wrath of customers and let my security system carry on doing it's job and block anyone trying to do my site harm even if its a Google by Proxy attack or do I risk a potential future attack by ignoring attacks coming from supposedly safe IP addresses?


The answer to the problem came from the now standard way of testing to see if a BOT really is a BOT. You can read about this on my page 4 Simple Rules Robots Won't Follow. It basically means doing a 2 step verification process to ensure the IP address that the BOT is crawling from belongs to the actual crawler and not someone else.

This method is also great if you have a table of IP/UserAgents that you whitelist but the BOT suddenly starts crawling from a new IP range. Without updating your table you need to make sure the BOT is really who they say they are.

Obviously it would be nice if Googlebot analysed all links before crawling them to ensure they are not hacking by proxy but then I cannot wait for them to do that.

I would be interested to know what other people think about this.


Matt B said...

thanks for mentioning this. we're seeing this occasionally as well.

Anonymous said...

Thank you for your analysis on this issue. We too have experienced Googlebot trying some quite nasty injection attempts, designed to overload the database more than anything. I too concluded one option that these where being indexed somewhere, fortunately I log each attempt and was able to build up a redirect based on data I collected and it intelligently evolved from there. Someone using Google as a proxy did come to mind, but that is pretty hardcore.

Rob Reid said...


Could you explain to me how you built up the redirect database to find out where the links were coming from?

Obviously referers are only passed through browsers so you would only get that value if a user in a browser clicked the link containing the sql injection. As GoogleBot was crawling the links there wouldn't have been a referer value passed to your site therefore I am interested in knowing how you solved the issue of where the attacks originated from.

I had to solve the problem by creating a "Ban Waiting" table in my logging DB in which all potential hackers reside until a job runs to analyse the data.

The job will run the forward/reverse DNS test (that more people are using) to ensure that it wasn't just someone with a user-agent switcher as well as validating the IP against known lists of IP's that I wouldn't want to ban under any circumstances e.g Google,Yahoo,Bing ,important clients etc.

Only if these tests fail do I actually ban the IP address and if the attacker is a bot like GoogleBot or Bing I just return a 404 code. It has prevented me from banning GoogleBot many a time now.

I do know Google used to run proxies that people could use so someone could have used GoogleBots agent and those Proxies.

Another scenario would be if someone (maybe a disgruntled employee) has left the company but still has access to the Google Websmaster Tools account for the verified site.

If they did have access they could use the "fetch as GoogleBot" tool to fetch the page along with a hack vector.

This would be seen as an attack from GoogleBot with their useragent/IP.

Only a theory but possible.

Thanks for commenting.