Tuesday, 29 September 2015

IE Bug with non Protocol Specific URLS

IE Bug with non Protocol Specific URLS

By Strictly-Software

I have recently come across a problem that seems to only affect IE and non protocol specific URLS.

These are becoming more and more common as they prevent warnings about insecure content and ensure that when you click on a link you go to the same protocol as the page you are on.

This can obviously be an issue if the site you are on is on an SSL but the site you want to go to doesn't have an HTTPS domain or vice-versa. However most big sites have both domains and will handle the transfer by redirecting the user and the posted data from one domain to another.

An example would be a PayPal button that's form posts to

action="//www.paypal.com/"

In the old days if you had a page on an SSL e.g https://www.mysite.com and had a link to an image or script from a non secure domain e.g http://www.somethirdpartysite.com/images/myimg.jpg you would get pop ups or warning messages in the browser about "non secure content" and would have to confirm that you wanted to load it.

Nowadays whilst you don't see these popups in modern browsers if you check the console (F12 in most browsers) you will still see JavaScript or Network errors if you are trying to load content cross domain and cross protocol.

S2 Membership Plugin

I came across this problem when a customer who was trying to join one of my WordPress subscription sites (that uses the great free S2 Membership Plugin), complained to say that when he clicked a payment button that I have on many pages, was being taken to PayPal's homepage rather than the standard payment page that details the type of purchase, price and options for payment.

It was working for him in Chrome and FireFox but not IE.

I tested this on IE 11 Win7 and Win8 myself and found that this was indeed true.

After hunting through the network steps in the developer toolbar section (F12) and comparing it to Chrome I found that the problem seemed to be IE doing a 301 redirect from PayPals HTTP domain to their HTTPS one. 

After analysing the response and request headers I suspect it is something to do with the non UTF-8 response that PayPal was returning to IE for some reason, probably because Internet Explorer wasn't requesting it as such for some reason.

Debugging The Problem


For the techies. This is a breakdown of the problem with network steps from both Chrome and IE and relevant headers etc.

First the Paypal button code which is encrypted by S2Member on the page. You get given the button content as standard square bracket short codes which get converted into HTML on output. Looking at the source on the page in both browsers on one button I could see the following.

1. Even though the button outputs the form action as https://www.paypal.com it seems that one of my plugins OR WordPress itself, although I suspect a caching plugin obviously, is changing my links (I haven't been able to narrow it down - or WordPress) is removing any protocols conflicts by using non protocol specific URLS.

So as my site doesn't have an SSL any HREF, SRC or ACTION that points to an HTTPS URL was being replaced with // e.g https://www.paypal.com on my page http://www.mysite.com/join was becoming //www.paypal.com in the source and generated source.

2. Examining the HTML of one of the buttons you can see this in any browser. I have cut short the encrypted button code as it's pointless outputting it all.


<form action="//www.paypal.com/cgi-bin/webscr" method="post">
<input type="hidden" name="cmd" value="_s-xclick">
<input name="encrypted" type="hidden" value="-----BEGIN PKCS7-----MIILQQYJKoZIhvcNAQcEoIILMjCCCy4CAQExgg..."


3. Outputting a test HTML page on my local computer and running it in IE 11 WORKED. This is was probably because I explicitly set the URL to https://www.paypal.com so no redirects were needed.

4. Therefore logically the problem was due to the lack of an HTTPS in the URL.

5. Comparing the network jumps.

1. Chrome

Name - Method - Status    - Type                     - Initiator
webscr - POST - 307      - x-www-form-urlencoded    - Other
webscr - POST - 200 - document             - https://www.paypal.com/cgi-bin/webscr

2. IE

URL         - Protocol  - Method - Result - Type       - Initiator
/cgi-bin/webscr         - HTTP      - POST   - 301       -           - click
/cgi-bin/webscr         - HTTPS    - POST   - 302       - text/html  - click
https://www.paypal.com/home - HTTPS   - POST   - 200       - text/html  - click

Although the titles are slightly different you can see they just are different words for the same thing e.g Status in Chrome or Result in IE both relate to the HTTP Status code the response returned.

As you can see Chrome also had to do a 307 (HTTP 1.1 successor to the 302 temporary redirect) from HTTP to HTTPS however it ended up on the correct page. Whereas in IE when I first clicked on the button it took me to the payment page in HTTP but then did a 301 (permanent) redirect to it in HTTPS and then a 302 (temporary) redirect to their home page.

If you want to know more about these 3 redirect status codes this is a good page to read.

The question was why couldn't IE take me to the correct payment page?

Well when I looked at the actual POST data that was being passed along to PayPal from IE on the first network hop I could see the following problem.

cmd=_s-xclick&encrypted=-----BEGIN----MIILQQYJKoZIhvcNAQcEoIILMjCCCy4CAQExgg...

Notice the Chinese character after the BEGIN where it should say PKCS7?

In Chrome however this data was exactly the same as the form e.g

cmd:_s-xclick
encrypted:-----BEGIN PKCS7-----MIILQQYJKoZIhvcNAQcEoIILMjCCCy4CAQExgg...

Therefore it looked like for some reason the posted data was being misinterpreted by IE whereas in Chrome it was not. Therefore I needed to check what character sets and response was being sent and returned.

Examining Request and Response Headers

When looking at the HTTP Request headers on the first POST to PayPal in IE I could see that the Accept-Language header was only asking for en-GB e.g a basic ASCII character set. Also there was quite a lack of request headers compared to IE. I have just copied the relevant ones that can be compared between browsers.

IE Request Headers

Key         Value
Content-Type: application/x-www-form-urlencoded
Accept-Language: en-GB
Accept-Encoding: gzip, deflate
Referer: http://www.mysite.com/join-now/
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
Request: POST /cgi-bin/webscr HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Host:         www.paypal.com

Chrome Request Headers

Key                 Value
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip, deflate
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Cache-Control: max-age=0
Connection: keep-alive
Content-Length:4080
Content-Type: application/x-www-form-urlencoded


And the responses for the Content-Type header which I think is key.

IE

Content-Type: text/html

Chrome

Content-Type: text/html; charset=UTF-8


So whilst Chrome is saying it will accept more language sets and gets back a charset of UTF-8 IE is only saying it will accept only en-GB and gets back just text/html.

I even tried seeing if I could add UTF-8 in as a language to accept in IE but there was no option to, so I tried adding Chinese which obviously use extended character sets and was the problematic character.
 However this made no difference even though the Accept-Language header was now:

Accept-Language: en-GB,zh-Hans;q=0.9,zh-CN;q=0.8,zh-SG;q=0.7,zh-Hant;q=0.6,zh-HK;q=0.4,zh-MO;q=0.3,zh-TW;q=0.2,zh;q=0.1

Conclusion


Therefore I came to the conclusion that I could not force IE to change it's behaviour and I doubt any phone calls to IE HQ or even PayPal would solve the issue. Therefore to allow IE users to be able to still pay on my site I needed a workaround.

1. I added to my technical test page which I make all users check before paying a test for IE and a warning about possible problems. This went alongside tests to ensure JavaScript and Cookies were enabled as both are needed for any modern JavaScript site.

2. I added in some JavaScript code in my footer that run on DOM load that looped through all FORM elements and checked the ACTION attribute. Even though when examining the results in the console they showed http://www.paypal.com rather than what I saw in the source //www.paypal.com I added some code in to ensure they always said HTTPS.

The function if you are interested is below and it seems to have fixed the problem in IE. If I view the generated source now I can see all form actions have HTTPS protocols.



// on DOM load loop through all FORM elements on the page
jQuery(document).ready(function () {	
	// get all form elements
	var o,,e=document.getElementsByTagName("FORM");
	for(var i=0,l=e.length;i<l;i++)
	{
		// get the action attribute
		o = e[i].action;

		// if current action is blank then skip
		if(o && o!="")
		{
			// if the start of the action is http (as non protocol specifc domains show up as http)
			// then replace the http with https
			if( /^http:\/\/www.paypal.com/.test(o) )
			{
				e[i].action = o.replace("http:","https:");
			}		
		
		}
	}	
});


So whilst this is a just a workaround for the IE bug it does solve the issue until Internet Explorer sorts itself out. Why they have this problem I have no idea.

I am outputting all my content as a UTF-8 charset and Chrome is obviously handling it correctly (along with Firefox and Safari).

So I can only presume it's an IE bug which isn't helped by an unknown (as yet) plugin (or WordPress) changing cross protocol URLs to the now standard //www.mysite.com format.

Therefore if you come across similar problems with redirects taking you to the wrong place, check your headers, compare browsers and if you spot something strange going on, try a JavaScript workaround to modify the DOM on page load.


© 2015 Strictly-Software

Friday, 18 September 2015

What is the point of client side security

Is hacking the DOM really hacking?

By Strictly-Software

The nature of the modern web browser is that it's a client side tool.

Web pages that are stored on web-servers when viewed in Chrome or FireFox are downloaded file by file (CSS, JavaScript, HTML, Images etc), and stored temporarily on your computer whilst your browser application puts them together so you can view the webpage.

This is where your "browser cache" comes from. It is good to have commonly downloaded files such as the jQuery script or common images from frequently visited pages in your cache but when this folder gets too big it can become slow to traverse and load from. This is why a regular clean out is recommended by a lot of computer performance tools.

So because of this putting any kind of security on the client side is pointless as anyone with a small working knowledge of Internet technology can bypass it. I don't want to link to a certain site in particular but it appeared as a google advert on my site the other day claiming to protect your whole website from theft, including your HTML source code.

However if you have a spare 30 minutes on your hands, have Firebug installed (or any modern browser that allows you to visit and edit the DOM) and did a search for "code to protect HTML" you would be able to bypass the majority of the sites wonderful security claims with ease.

Examples of such attempts to use client side code to protect code or content include:

1. Trying to protect the HTML source code from being viewed or stolen. 

This will include the original right mouse click event blocker.

This was used in the old days in the vain hope that people didn't realise that they could just go to Tools > View Source instead of using the context menu which is opened with a right click on your mouse.

The other option was just to save the whole web page from the File menu. 

However you can now just view the whole generated source with most developer tools e.g Firebug - or hitting F12 in Chrome.

Some sites will also generate their whole HTML source code with Javascript code in the first place. 
Not only is this really really bad for SEO but it is easily bypassed.

A lot of these tools pack, encode and obfuscate it on the way. The code is then run through a function to evaluate it and write it to the DOM

It's such a shame that this can all be viewed without much effort once the page loads in the DOM. Just open your browsers Developer Toolbar and view the Generated Source and hey presto the outputted HTML is there.

Plus there are many tools that let you run your scripts on any page e.g someone at work the other day didn't like the way news sites like the BBC always showed large monetary numbers as £10BN and added a regular expression into one of these tools to automatically change all occurrences to £10,000,000,000 as he thought the number looked bigger and more correct.  Stupid example I know but it shows that with tools like Fiddler etc that you can control the browser output.

2. Using special classes to prevent users from selecting content

This is commonly used on music lyric sites to prevent people copying and pasting the lyrics straight off the page by selecting the content and using the copy button.

Shame that if you can modify the DOM on the fly you can just find the class in question with the inspect tool, blank it out and negate it's affect.

3. Multimedia sites that show content from TV shows that will remain unnamed but only allow users from the USA to view them. 

Using a proxy server sometimes works but for those flash loaded videos that don't play through a proxy you can use YSlow to find the base URI that the movie is loaded from and just load that up directly.

To be honest I think these companies have got wise to the fact that people will try this as they now insert location specific adverts into the movies which they never used to do. However it's still better than moving to the states!

4. Sites that pack and obfuscate their Javascript in the hope of preventing users from stealing their code. 

Obviously minification is good practise for reducing file size but if you want to unpack some JavaScript then you have a couple of options and there maybe some valid reasons other than just wanting to see the code being run e.g preventing XSS attacks.

Option 1 is to use my script unpacker form which lets you paste the packed code into a textarea, hit a button and then view the unpacked version in another textarea for you to copy out and use. It will also decode any encoded characters as well as well as formatting the code and handling code that has been packed multiple times.

If you don't want to use my wonderful form and I have no idea why you wouldn't then Firefox comes to the rescue again. Copy the packed code, open the Javascript error console and paste the code into the input box at the top with the following added to the start of it:

//add to the beginning eval=alert;
eval=alert;eval(function(p,a,c,k,e,r){e=String;if(!''.replace(/^/,String)){while(c--)r[c]=k[c]||c;k=[function(e){return r[e]}];e=function(){return'\\w+'};c=1};while(c--)if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);return p}('3(0,1){4(0===1){2("5 6")}7{2("8 9")}',10,10,'myvar1|myvar2|alert|function|if|well|done|else|not|bad'.split('|'),0,{}))

// unpacked returns
function(myvar1,myvar2){if(myvar1===myvar2){alert("well done")}else{alert("not bad")}



Then hit evaluate and the unpacked code will open in an alert box which you can then copy from.

What the code is doing is changing the meaning of the function eval to alert so that when the packed code runs within its eval statement instead of executing the evaluated code it will show it in the alert message box.

There are many more techniques which I won't go in to but the question then is why do people do it?

Well the main reason is that people spend a lot of time creating websites and they don't want some clever script kiddy or professional site ripper to come along steal their content and use it without permission.

People will also include whole sites nowadays within frames on their own sites or just rip the whole thing CSS, images, scripts and everything else with a click of a button. There are too many available tools to count and a lot of phishing sites will copy a banks layout but then change the functionality so that it records your account login details.

I have personally seen 2 sites now that I have either worked on or know the person who did the work appear up on the net under a different URL with the same design, images, JS code, all the same apart from the wording was in Chinese .

The problem is that every modern browser now has a developer tool set like Firebug, Chrome or Internet Explorers developer toolbar. For older browsers there are Operas dragonfly and even Firebug-lite which replicates Firebug functionality for those of you wanting to use it on older browsers like IE 6.

Therefore with all these built in tools to override client side security techniques it seems pretty pointless trying to put any sort of security into your site on the client side.

Even if you didn't want to be malicious and steal or inject anything you can still modify the DOM, run your own Javascript, change the CSS and remove x y and z.

All security measures related to user input should be handled on the server to prevent SQL injection and XSS hacks but that's not to say that duplicating validation checks on the client isn't a good idea.

For one thing it saves time if you can inform a user that they have inputted something incorrectly before the page is submitted.

No one likes to fill in a long form submit it and wait whilst the slow network connection and bogged down server takes too long to respond only to show another page that says one of the following:
  • That user name is already in use please choose another one.
  • Your email confirmation does not match.
  • Your password is too short.
  • You did not complete blah or blah.
Things like this should be done client side if possible, using Ajax for checks that need database look ups such as user name availability tests. Using JavaScript to test whether the user has JavaScript enabled is a good technique for deciding whether to rely purely on server side validation or to load in functions that allow for client side validation if possible.

However client side code that is purely there to prevent content from being accessed without consent seems pointless in the age of any modern browser.

Obviously there is a large percentage of web users out there that wouldn't know the first thing to do when it comes to bypassing client side security code and the blocking of the right click context menu would seem like black magic to them.

Unfortunately for the people who are still wanting to protect their client side code the people that do want to steal the content will have the skills to bypass all your client side cleverness.

It may impress your boss and seem worth the $50 for about 10 minutes until someone shows you how you can add your own Javascript to a page to override any functions already there for blocking events and checking for iframe positioning.

My only question would be is it really hacking to modify the DOM to access or bypass certain features meant to keep the content on that page?

I don't know what other people think about this but I would say no its not.

The HTML, images, JavaScript and CSS are ALL on my computer at the point of me viewing them on whatever browser I am using. Therefore unless I am trying to change or inject anything into the web or database server to affect future site visitors or trying to bypass challenge responses then I am not really hacking the site just modifying the DOM.

I'd be interested to know what others think about that question?

By Strictly-Software

© 2015 Strictly-Software