Monday, 24 November 2008

Trying to detect spoofed user-agents

User Agent Spoofing

A lot of traffic comes from browsers either masking their real identity by using a different user agent than the real one associated with the browser or a random string which relates to no known browser. The purpose of doing this is many fold from malicious users trying to mask their real identity to get round code that may ban on user-agents or get round client side code that may be blocking certain browsers from using certain functionality which is all a good reason for using object detection rather than browser sniffing when deciding which code branch to run.

However as you will probably know if you have tried doing anything apart from simple JavaScript coding there are still times when you need to know the browser because object detection just isn't feasible and trying to use object detection to work out the browser is just as bad in my opinion as using browser sniffing to work out the object.

Therefore when someone is using an agent switcher to mask the browsers agent and you come across one of these moments then it may cause you to run code that will raise errors. There is no foolproof way to spot whether an agent is spoofed but one of the things you can do if you do require this information is compare the agent with known objects that should be supported by that browser and if they don't match then you can confirm the spoof.

This form of spoof detection will only work if its only the user agent string that has been changed but an example of some of the checks you can do include for agents that say they are Opera make sure it supports window.opera as well as both event models document.addEventListener && document.attachEvent as far as I know its the only browser that does support both. For IE you shouldn't check document.all by itself as you will actually find Firefox will return true for this but you can check for window.ActiveXObject the non existence of addEventListener and use conditional comments to test for JScript. Firefox should obviously not support JScript as it uses Javascript.

Those are just a few checks you could do and you are basically using object detection as well as agent sniffing together to make sure they match. They may not tell you the real browser being masked but they can be used to tell you what its not. 

The idea of this is to make sure that in those cases where you have to branch on browser rather than object (see this previous article) that you make the right choice and don't cause errors. Obviously you may decide that if the user is going to spoof the agent then leave them to suffer any errors that may come their way.

If you do require a lightweight browser detector that checks for user agent spoofing amongst the main browsers as well as support for advanced CSS, Flash and other properties then see this article.

5 comments:

Anonymous said...

So it will not work if the user "alters" all parts of the useragent string ?
Would you be able to detect user switching user agents with, for instance, the user agent switcher plug-in for FF using a user string like this:
Description: Iphone 3.0
User Agent: Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_0 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7A341 Safari/528.16
App version: 5.0 (iPhone; U; CPU iPhone OS 3_0 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7A341 Safari/528.16
Platform: Iphone
Vendor: Apple Computer, Inc.

Thanks

R Reid said...

The aim is not to use the useragent string but to try and detect which objects that browser( that the useragent string says / or pretends it is) should OR should NOT support.

E.G IE supports window.attachEvent whereas FireFox does not it supports window.addEventListerner.

Opera supports both.

So does IE 9.

But opera has the window.opera object and IE 9 has a document.mode value of 9.

Firefox uses Javascript not JScript whereas IE uses JScript.

IE supports ActiveX but others don't.

As new versions of browser come out things change and it makes accurate detection harder to accomplish.

If a user in anyway can "inject" javascript into your page (which of course they shouldn't be able to) they could "create" the necessary objects to go along with the browser they are spoofing as JS lets you overwrite objects.

But it is probably impossible to come up with a totally foolproof solution for ALL cases. You might be able to detect certain spoofs but if the user turns JavaScript off then your stuck as all you have is the useragent to go on.

I did write this article a long time ago and now that IE9 supports standard JS objects like the DOM 2 event model (addEventListener) then it becomes a lot harder and in your example of wanting to detect whether the user is really WebKit/Iphone you would need to be looking for objects only that browser supports and others don't.

One way if you cannot find any objects (and I cannot think of any off the top of my head) is that you could use certain CSS styles to find out if the useragent is "real" or not.

For example with Webkit browsers (Chrome/Safari/IPhone) you could add a DIV into the DOM (using JS) and then try applying webkit styles to the DIV e.g anything with
-webkit in front of it e.g:
-webkit-border-radius
and then use JS to detect the current style of that DIV to see if that style had been actually applied or not.

If it has then you know it really is a webkit browser and if it hasn't then its a spoofer BUT it could be another webkit spoofer e.g Chrome spoofing Safari.

Telling the difference between Chrome spoofing Safari or vice versa is a lot harder as you would need to find JS objects OR CSS styles that are in one webkit based browser and not the other webkit based browser. As they are almost identical then I doubt you would find any (if you do please let me know).

That's just an idea but I have no code for it and as browsers become more standardised (all supporting the same css and JS) then it will become harder to detect differences.

I vaguely remember some other article on the web somewhere that used clever techniques to find out the "real" browser (or use a process of elimination to rule out browsers) but I don't know where it is.

This link will show you a way to detect the IE version but again with IE 9 it becomes quite hard to do 100% accurately.

http://blog.strictly-software.com/2009/03/detecting-ie-8-compatibility-modes-with.html

and I did write it a while back, so it might not even work!

Hopefully this info helps you.

Cristian said...

Detect all possible browsers like: proxies, desktops, tablets, mobiles and applications (java, symbian, android). This is the best code ever, enjoy ;) http://code.google.com/p/detect-real-user-agent/

R Reid said...

Well I am on an iPhone so it may be the reason but I couldn't find any source code in the link you sent me.

Without seeing the code ( and i will check when im next on a PC )I can only imagine you are trying to either emmulstd the browscap.ini system used by PHP & ASP whic only shows useragents and features they are SUPPOSED to support which is easily defeated by an agent switcher. Or it's a server side version of a massive IF statement checking all known possible agents.
As I can easily set up a transparent anonymous proxy server I can see no way in the world of detecting this kind of proxy PLUS it has nothing to do with useragents.
You would need to maintain a massive list of known Proxy IP addresses to detect proxies and as most are servers being unwittingly used as a proxy until the owner finds out and turns on a firewall then an IP can be a proxy one day and not the next.
Therefore I would be interested in seeing some code as I don't believe it possible to do.
Thanks for commenting

R Reid said...

I checked the code on my PC (downloaded the .zip) and as expected all you have done is a massive big IF statement trying to accommodate all known browsers at this point in time.

Reasons this won't work.

1. What if I put this in my user-agent switcher tool as the user-agent

Robs-Robob0T1

Answer - Your code wouldn't match it as it's not in there. New bots come out every day so you would have to keep this file updated everyday.

2. Spoofers, spammers, hackers etc like to use random letters and numbers (gibberish) as user-agents e.g
??
__main__/0.1
+http://robot.vedens.de VEDENSBOT
['dsin Bot']
.NET Framework Test Client
*/Nutch-1.0

Those are just a few of the 1,046,291 useragents my own logger system has collected over the last 3 years.

Your IF statement would fail on 99% of these and even if you attempted to run a regex on all 1 million+ agents you would make the page so slow it would be unusable.

3. You are relying on what the user has put in the header as the useragent - I can change the Request headers (user-agent, x-forwarded-for, content-type etc) to whatever I want when I make a request. Therefore I can be using IE 9 and change the useragent to FireFox 11 and your code would tell your user it is using Firefox 11 which is obviously wrong.

4. You are missing out on all the default HTTP libraries that a lot of script kiddies use when scraping e.g CURL, WinHTTP, LWP, ColdFusion which are defaulted to be the agent when one isn't set for the code doing the HTTP request.

These are all reasons just trying to look at the useragent string is not a feasible way of detecting the "true" useragent of the user.

The article (which is old) was trying to use known JavaScript differences between browsers to find out a) what the agent is NOT and b) what the agent COULD be.

There is no 100% foolproof way of detecting agents and as I said you cannot detect proxies from a useraget string so I don't understand why you even mentioned that.

Any questions ask me but I wouldn't run 200+ regular expressions to try and find out a useragent as :

a) it won't work accurately (for anyone who is spoofing)

b) it will slow down your server - the more Regex you use - the slower it will get.

If you are trying to find out agents for banning purposes then

a) I would ban IE 6 as most hackers on our system seem to use this.

b) I would put the rules into your .htaccess file and combine them see >> http://blog.strictly-software.com/2010/04/banning-bad-bots-with-mod-rewrite.html

c) I would ban blank and very short agents (jibberish)

d) I would use JS to find the agent - if JS cannot be run there is a good chance the user is a BOT so you can rule out humans or show a CAPTCHA / BotTrap etc.

But as I said there is no fool proof way of detecting the "real" useragent of any user by sniffing the useragent header string especially.

Thanks for commenting though.
These ar