Saturday, 18 June 2016

Why just grabbing code from the web can lead to major problems down the line

Why just grabbing code from the web can lead to major problems down the line


I have wrote many articles over the years about server, system, website and PC performance, and it seems that the more versions of FireFox and Chrome that come out, the slower they get. I don't think I have ever used IE 11 as much as I have in the last 3 months. Mostly just to get Facebook, Radio 1 or Google+ to load within a minute which FF and Chrome seem to have issues with for some reason.

Some add-ons like uBlock Origin prevent 3rd party domain code from being loaded up on the site as well as large image or video/flash objects. It also stops pop-up windows and the loading of remote CSS fonts which is all the craze now.

What the developers of these websites don't seem to realise is that when they are loading in code from all over the web just to make a page display or run it causes a lot of network traffic. It also introduces the possibility that the code at the end source has been tampered with and therefore you could be loading in Cross Site Scripting hacks or ways for people to exploit your site if that certain script exists in the DOM.

Also a less likely scenario but a more common issue is that the more domains your site has to access to get all it's code onto the site, it can mean the page doesn't load as you may want it to, or even not at all.

If script A relies on Script B but Script B doesn't load for a long time then the code in Script A that was going to open a popup window on DOM Load, or play a video just isn't going to work.

I recently overrode the Window.OnError event and logged the Message, URL and Line No with an AJAX call to a log file before either throwing the error for modern sites or hiding it for older ones.

When I started looking through these files the amount of Google AdSense and Tracker scripts not loading due to timeouts is incredible. Also there are issues with bugs in the scripts or due to their slow loading objects not being available for other scripts relying on them to use. An example of just one error is:

24/04/2016 09:54:33 : 8X.XXX.XXX.161 'document.body' is null or not an object in on line 19

People relying on Google for stats shouldn't for a number of reasons. Not only do they not always load and record the visit, but they also rely on 3rd party cookies being enabled and JavaScript being enabled. A Log parser or DB is a much better way to log every single visitor BOT or Human.

For example if you have a main jQuery script you are loading in from a CDN or from a site you don't control, if that domain is having network problems then that means any other code on the site reliant on it won't be able to work until that issue is resolved. This happens a lot from viewing the messages in my JavaScript error log file.

Due to this a lot of  people just grab the code off the net and load it in from a local server to get round network delays.

However by doing this they are stuck in a point of time (the date and the version they copied the file at). I hate this, as instead of actually learning JavaScript so they know what they are doing they are relying on some other blokes framework to solve their problems e.g have a look at whose code most of you are building your site with. If there is a bug in jQuery you either have to fix it yourself or wait for John to fix it. If it's your own code at least you can rely on your own skills and know how the code works.

The other day I had to solve a jQuery problem where the page in question was using an old version of jQuery and another 3rd party script built around jQuery (but not by John), called reveal.js.

As the front end developers wanted to move to the latest version of jQuery they suddenly found that the reveal.js code no longer worked.

After debugging it was clear that the $().live(function) had been removed and as the code that did the popup relied on reveal.js and it was built in 2011 with no recent updates. The whole revealing and hiding of modal boxes stopped as soon as a modern version of jQuery was loaded in for the site.

I had to waste time reading up on jQuery and then hardcoding the version of reveal.js as we had to use the new .on() function so that the new jQuery libraries would work with the old code that was taken from a library developed in 2011.

This is one thing I hate about front end developers who just pick n choose libraries off the web despite them all doing the same thing like event binding and removal multiple times in multiple ways.

If they are relying on a 3rd party library they took from 2011 that also relies on a constantly updated framework like jQuery that is always dropping and adding new methods, then how are people to expect sites to work when a method these libraries rely on are removed?

If they cannot write some basic notes to say that this page relies on this script e.g reveal.js, which came with jQuery 1.4.5 then it makes people like me who hate debugging other peoples frameworks hate 3rd party code even more.

Not only do I have my own getme.js framework which is simple, uses CSS selectors, linked methods where the array of objects is passed down from function to function, but now that most browsers support the simple one line of code that allows for selectors to find objects there is no need to add Sizzle.js to it any-more. Not unless you really want to support old IE versions you can just use this single line.

// where query is the CSS selector
document.querySelectorAll( query ); 

For example in my Getme.js code this following line of code will loop through all Anchor nodes with a class of menu on them inside the DIV with the ID MAIN. I just then alert out the elements ID.

G('DIV#Main >').each(function(){

Obviously if you do all your styling in CSS or inline JS you have the option of how to style a series of objects for example with the .setAtts method you can pass in any element attribute and their values.

This is providing a mixture of a class and inline styles to the Paragraphs inside DIV tags. It also uses chaining where the array of object are passed from one function to the next just like other frameworks.

The first example just looks for DIV tags with P's inside and sets the class to "warningRed" and the style of the font to bold and red. The class can do most of the styling or ALL of it.

It's just an example, so is the 2nd one that shows all P tags with a SPAN with the class "info". Inside it gets a warning message with the .setHTML method and then the .setStyle method colours the text.

G('DIV > P').setAtts({class:"warningRed", style:"color:red; font-weight:bold"});

G('P >').setHTML('CLick for help.').setStyle({color:red, fontSize:8px});

I used a G instead of $ just to distinguish it from all the other frameworks and because it's called Getme.js.

If you want to know how to learn to write your own chainable framework then have a read of this article of mine. I've kept Getme.js simple as I hate people who just copy code from the web especially when it goes wrong.

At least this way I have a wrapper object that allows for chaining and the setting of multiple attributes at once and the use of selectors. However I still like to use pure JavaScript inside my functions so people down the line can get their heads around it.

So next time I get a jQuery problem because John Resig has decided to remove a core function from his framework which then causes a chain re-action due to all the other frameworks that were built around that version of jQuery, I can at least (hopefully) use my simple framework to apply the CSS that the designers need to rather than spend a day hunting around for fixes to other people's code.

That, is something I really hate doing.


© 2016

Don't Be Fooled By "Turbo Boost" and Windows Performance / Cleaner Applications

Don't Be Fooled By "Turbo Boost" and Windows Performance / Cleaner Applications


I bet if you have been online for a more than a few times you will have undoubtedly seen adverts for tools and applications that will "Speed up your computer" or "Tune it up", "remove unnecessary files" and even malware.

Most of these apps are con tricks in that they will run, show you a really high number of problems either to do with security, privacy or performance and when you go to fix them you are told you must pay a fee of £29.99 to get the full version.

Scam code I call it.

Mainly because people don't know what half the items that are recorded as security holes or performance issues are. For example to get a nice big list of privacy concerns about 20,000 they might list every single cookie you have from every browser.

If you don't know what a cookie is it it's a harmless small text file that holds very small information about your visit to the site e.g by linking your username to a member ID so that the next time you visit the site you don't have to keep re-typing your username in the login box.

For example if you install the Web Developer Toolbar on FireFox you can view all the cookies on a site, domain including sessions. Viewing the cookies for this site I see one that gives me this really important information....

Name: SNID
Value: 72=i-mBmgOp22ixVNh68LucZ_88i1MnYk0FkV2k8k3s=uNr4G5YjLe6X9iAQ
Path: /verify
Expires: Mon, 11 Apr 2016 16:43:43
GMT Secure: No
HttpOnly: Yes

I have no idea what the cookie value for SNID means and most people apart from the web developers won't so when people try and scare you with "cookies are dangerous" - something I have heard from my parents many times - just ignore their ignorance of web development.

They just need to realise that unless your password is stored in a plain text cookie (which never happens) then you don't have much to fear from cookies at all. They just fill up your local data directories the more sites you visit.

The one thing you may not like are tracking cookies e.g Google who try and track you from site to site to see what kind of information you are interested in so that they can show you relevant adverts.

Turning off 3rd party cookies in Chrome or the browser of your choice and setting DNT (Do Not Track) to YES/ON is worth doing even if some browsers don't support the DNT header.

Turbo Mode

Turbo mode is one of those cool sounding options that seem to signal that just by pressing the Turbo ON button your whole machine will speed up. In reality it does a few things, many of which might not even be happening at the time you press it.

These include:

-Stopping a scheduled de-fragmentation of your hard disk. Something that is rarely needed or used anyway but does consume memory and CPU if running.
-Stopping any scheduled tasks from running. These could be updates, downloads of applications that require updates and the automatic creation of system backup and restore points.
-Postpone the automatic download and installation of important application and Windows updates.

You will be informed about the postponing of downloads and automatic updates such as Windows Updates if enabled.

In reality it doesn't do much but sounds and looks good when it says it has boosted your systems performance by 25% etc. Just beware that there is no way of it really knowing how much it has helped and it is probably negligible anyway.

If you really want to speed up your PC, open the task manager, enable the show all processes option and then order the results by CPU or Memory. The programs at the top using over 1GB should certainly be looked at and may have memory leaks.

A shut down of those applications and then re-opening of them might help you out a lot. I find some apps like MS SQL 2015 really drain my memory if I leave them on for days and a reboot now and then is the best remedy for most problems.

It may be a joke from the IT Crowd to "Turn it on and off again", but in reality that does solve a hell of a lot of problems with computers running high memory or CPU.

Always try and install Windows updates regularly so you are not waiting around hours for those 64 updates to install like I have a number of times due to keep hitting the "Remind me in 15 minutes" button. A reboot with the most up to date software is the best thing you can do for your PC as well as removing applications and plugins for browsers that you never use.

The more unnecessary applications you have on your system the more apps you will find in your Windows Start Up options running just to monitor for updates. Google does it, iTunes does it, and many other programs do as well. The more you can trim your system down so it only uses what you want it to use the better.

Plugins on browsers that were only used once should be removed afterwards.Regularly check if you are actually using all the browser plugins as when they are updated the old versions are hardly ever removed.

Applications you downloaded to do one task should also be uninstalled before you forget about them.

The leaner the machine the quicker the machine. I have a 16GB RAM 64GB Windows box at work and I regularly hit 12/13GB of memory. I usually know this is happening because the radio cuts out. However as I hate closing everything down, waiting for the installations and then trying to remember what I had open at the time I tend to let the memory rise and rise and then get frustrated as everything slows down.

If someone could invent a program that would remember what was open and then after rebooting re-open every app, file (with text), and program that was running before would make a mint. If something like this already exist PLEASE TELL ME WHERE I CAN FIND IT!

Clean your PC manually

This part of the article shows you how these myriad of application cleaner tools which trick you into paying money to speed up your PC are basically useless. Even tests have proved that running the following Windows 8+ built system applications can be just as affective.

Use the built in Disk Cleanup tool included with Windows. It’s focused on freeing up space on your hard drive, but it will also delete old temporary files and other useless things. Just tap the Windows key, type Disk Cleanup, and press Enter to launch it. You can even schedule a Disk Cleanup to clean your computer automatically.

When the tool pops up it will list a number of folders and system folders containing files that build up over time the more you use your PC.

Whilst this might be good in regards to browser cache when you are constantly going to the same sites over and over again as it means the photos and other files are locally stored on your computer preventing a network look up to download them again, these are files that you probably use once and forget about. This causes the folder size to rise and rise slowing down access. If you don't go to the sites often enough for a browser cache to be useful then clean it out. A tool like CCleaner can let you decide which sites get cleaned and which others don't.

Remember to regularly clean the following:
  • Your downloaded folder, apps, videos and other files that you have then installed or watched and no longer need.
  • Device Driver Downloads after installation.
  • Empty the Recycle Bin
  • Clean the System Error and Memory Dump Files
  • Delete Temporary Files 
  • Delete User File History

There are tools that are free that help you do all this, backing up your PC before the deletions in case something goes wrong. We will look at CCleaner in a bit.

So if you don't want to rely on costly tools that try and trick you into paying money to make you feel safe there are plenty of ways around it.

1. Don't be tricked by the salesperson at PC World who promises you McAfee Anti Virus software is the best way to protect your PC. It's insurance, and they get the money - a bonus to the sales person so to speak.

There is no need to waste money on a tool that will kill your CPU by constantly scanning every single file your computer accesses (which is a lot), when there are free tools like MalawareBytes Anti-Malware which can be downloaded for free online. There is a premium version if you do require constant analysis of every file your PC comes in contact with but I haven't found it to be needed.

Just run a scan once a week and make sure to never open .ZIP, .EXE, .DOCX or .PDF files in emails especially when you are not expecting them and they are from people you don't know.

Also please remember that is VERY EASY to fake the "FROM" address in an email (1 line of code), so if your a member of a site and someone sends you a flashy looking email that seems to be from PayPal, Facebook or your bank with the address do at least a few things before opening the file.

1. Open the full email headers so that you can see the original sender of the email. Is it from Facebook or your bank?

2. If you are not sure as it's an IP address e.g then run that in a command prompt with the line >> nslookup and make sure it returns a known address. If it comes back empty or with an unknown name e.g use an online Whois tool (there are lots online), or if you have installed WhoisCL on your Windows computer type whoisCL and see what the WHOIS details return about the owner of the address. It should tell you what country it's from and an email address to complain to if you are being spammed by it.

3. If the HTML email looks fancy like your bank or Facebook or some other site. Move your mouse over some of the bottom links in the footer or side bar. Most site strippers will only bother putting code behind the main buttons so they can log your typing e.g Login, Password, Forgot Password etc. If you roll your mouse over the "About" or "Help" links and all you see is a # instead of a proper URL then that is suspicious. Delete the email ASAP!

Remember banks never ask you for your PIN code so never trust a site asking you for that. Also if it asks you for information about your mothers maiden name, first pet, first school, favourite colour and other information used to verify you by sites you should shut it down ASAP.

4. If the headers look okay it could still be a hacked mailserver or a man in the middle attack so right click the file and if you installed Malaware properly you should be able to run a virus scan over the file with one click before saving or opening it. If you can't then save it to your computer and run a virus check on the file before opening it. Never just open the file whoever you may think it's from.

Regularly clear your browser history or even better, set your browser to automatically clear its history when you close it if you don’t want to store a history or even better just use the browsers secret browsing options e.g Chrome's is called Incognito and allows you to surf the web without leaving a history or storing cookies on your machine.

Also clear your browser cache every now and then. Whilst a cache is good for quick loading of images and files (JS, CSS, JPEGs) that are used often. Once it becomes too large then it gets slower and slower to find those files you need so it negates the usefulness of it due to it's size.

Run the Disk Defragmenter included with Windows. This isn't necessary if you use an SSD or solid-state drive.

Don’t bother with a registry cleaner or other performance tool if you have to pay for it. If you want an application to help you then CCleaner is that tool.

You can download from here: CCleaner, The good thing about it, is that it's the best-tested registry cleaner out there.

I always run a registry clean after removing applications from my computer to ensure any registry keys and file extensions left over are also removed. CCleaner will also delete your browser cache for all the browsers you use, as well as cookies, saved passwords, web history and temporary files for other programs.

You have the choice to tick what you want to clean and what not to clean but the free tool CCleaner does a lot more than many of these PC cleaning apps do. A test performed in 2011 by Windows Secrets found that the Disk Cleanup tool included with Windows was just as good as paid PC cleaning apps.

Note that this is true even though PC cleaning apps fix “registry errors” while the Disk Cleanup app doesn't, which just shows just how unnecessary registry cleaners are. So don't waste money being "blackmailed" into buying the premium version of these clean up tools.

So yes, it’s been tested, PC cleaning apps are worthless. Tune your PC yourself and you will get better results.

If you want to download CCleaner which is the recommended tool that professionals use then you can get it from here


© 2016

Tuesday, 17 May 2016

Stopping BOTS - A Multi Layered Approach

Stopping BOTS - A Multi Layered Approach

By Strictly Software

Some people don't mind BOTS of all shapes and form roaming their sites but if you actually look into what they are doing should you be worried about their actions?

Have you examined your log files lately to see what kind of BOTS are visiting and how much bandwidth they are using?

Here are a few of the reasons you might want to care about the type of actions carried out by automated crawlers (BOTS):

1. They eat bandwidth. Social media BOTS especially who jump onto any link you post on Twitter causing Twitter Rushes. This is where 50+ BOTS all hit your site at the same time and if you are not careful could use up all your memory and cause a frozen system if not configured properly. There are plenty of articles about Twitter Rushes on this site if you use the search option down the right hand side to find more details.

2. Bandwidth costs money. If you are a one man band or don't want high server costs then why would you want social media BOTS, many that provide no benefit to you, costing you money just so they can provide their own end users with a service?

3. Content theft. If a user-agent identifying itself as IE6 is hitting a page a second is it really a human using an old IE browser visiting that many pages? Of course not. However for some reason IE 6 is the most popular user-agent used by script kiddies, scrapers and hackers. Probably because they have just downloaded an old crawler script off the web and run it without the knowledge to edit the code and change the agent. Look for user-agents from the same IP hitting lots of pages per minute and ask yourself are they helping your business or just slowing your site down by not obeying your robots.txt crawl-delay command?

4. Hacking. Automated hackbots scan the web looking for sites with old OS systems, old code and potential back doors. They then create a list of sites for their user and come back to penetrate these sites with SQL/XSS injection hacks. Some might show up in GET requests in the log file but if they are tampering with FORM elements then any POSTED data containing hack vectors won't show up. Hiding key response parameters such as your server brand and model and the scripting language you use are good simple measures to prevent your sites name ending up on this list of potential targets to hack and can easily be configured in config files on your system.

Therefore you should have a defence against these type of automated BOTS. Of course you also have the human hacker who might find a sites contact form, view the source, tamper with the HTML and work out a way to modify it so he can send out mass emails from your server with a custom script. Again security measures should be implemented to stop this. I am not going to talk about the basics of security when it comes to preventing XSS/SQL injection but the site has many articles on the topic and basic input sanitation and database login security measures should stop these kinds of hack.

So if you do want to stop automated BOTS from submitting forms, registering to your site, applying for jobs and anything else your site might do the following list might be helpful. It is just an off the head list I recently gave to someone on LinkedIn but could be helpful if expanded to your own requirements.

On my own sites I use a multi pronged approach to stop BAD BOTS as well as bandwidth wasting social media BOTS, hack bots and even manual hackers tampering with the forms. It saves me money as well as increases performance by allowing legit users only to use the site. By banning over 50% of my traffic which is of no benefit to me I can give the 50% of useful traffic a better user experience.

1) We log (using Javascript), whether the user has Javascript enabled e.g an AJAX call on the 1st page they hit that sets a session cookie using Javascript. As most BOTS don't use Javascript we can assume if they have Javascript enabled they are "probably" human.

2) We also use Javascript (or the 1st page HTTP_ALL header in IE) to log whether Flash is enabled and the version. A combo of having Flash running and Javascript isbetter than just Javascript on it's own.

3) I have my own logger DB that records browser fingerprints and IP's, Useragent, Javascript, Flash, HTTP settings, installed apps, browser extensions, Operating System and other features that can almost uniquely identify a user. The problem is of course an IP often changes either through DCHP or the use of proxies, VPN's and hired VPS boxes for an hour or two. However it does help in that I can use this combination data to look up in my historical visitor database to see what rating I gave them before e.g Human, BOT, SERP, Hacker, Spammer, Content Thief and so on. That way if the IP has changed but the majority of the browser finger print hasn't I can make an educated guess. If I am not 100%  sure however I will then go into "unsure mode" where security features such as CAPTCHAS and BOT TRAPS are introduced just in case. I can then use Session variables if cookies are enabled to store the current status of the user (Human, BOT, Unknown etc), or use my visitor table to log the browser footprint and current IP and do lookups on pages where I need to use defensive measures if cookies are not enabled.

4) These Session/DB settings are then used to decide whether to increment banner hit counters, write out emails in images or with Javascript so that only humans can see them (to prevent BOT email scrapers), and other defensive measures. If I know they are 100% human then I may chose not to deploy these measures.

5) On forms like contact forms I often use BOT Traps. These are input elements that are in the flow of the form with names like email_extra that are hidden with CSS only. If the BOT submits a value for this hidden input I don't submit the form, or I do but without carrying out the desired action and not let the BOT know that nothing happened.

6) A lot of forms (especially contact forms) can be submitted by just entering an email address for all fields (name, email, password etc). Therefore I check that the field values are different e.g not the same value for an email AND password field. I also ensure the name matches a name pattern with a regular expression.

7) I have built my own 2 stage CAPTCHA system which can be turned on or off on the fly for forms where I don't know if the user is 100% human OR I can decide to just always have it on. This is based around a maths question, where the numbers are in 3 automatically created images, grey and blurry like normal CAPTCHA's The user has to first extract the right numbers from the images then carry out an automated sum from those numbers e.g add number 1 to number 2 and deduct number 3. This works very well as it requires a human brain to interpret the question and not just use OCR techniques to extract the CAPTCHA image values. There are so many OCR breakers out there that a standard CAPTCHA where you enter the word on the picture can easily be cracked automatically now.

8) If there is textarea on the form, contact, application etc, then I use my RUDE word table which has hundreds of variants of rude words and the regular expression next to it to detect them. This can obviously be updated to include pharmacy pill names, download movies, porn and other spam words.

9) I also have a number of basic regular expressions if the user wants light detection that checks for certain strings such as "download your xxx now", "buy xxx for just $£", and words like MP3s, Films, Porn, Cialis and other common spam words that would have no place on a site not selling such goods.

10) I always log any blocking so I can weed out any false positives and refine the regular expressions etc.

11) I also have an incremental ban time so the 1st time anyone gets banned is for 1 hour, then 2, then 4 then a day etc etc.The more times they come back the longer they get banned.

12) Sometimes I use JavaScript and AJAX to submit the form instead of standard submit buttons. As Javascript is so commonly used now (just look at Google), then most people have it enabled otherwise the majority of sites just wouldn't work or would have minimum features. It would require a human hacker to analyse your page to break it and then write a custom BOT just to hack the form when a technique like this is used. To get round this you can use a rolling random key created server side, inputted into a hidden element with Javascript on page load and then examined on form submission to ensure it is correct. If it's not then the person has tampered with the form by entering an old key not the new key and can be banned or blocked.

13) Another good way to stop automatic hack BOTs (ones that just roam the web looking for forms to try and submit and break out of to send emails etc - contact forms), is to not use FORM tags in your server side code but have compressed and encrypted JavaScript that on page load converts the <div id="form">....</div> into a real FORM with an action, method etc. Anyone viewing the non generated source code like most BOTS, won't see a FORM there to try to hack. Only a generated HTML source view (once the page has loaded), would show them this, which most BOTS would not be able to view.

14) Honeypots and Robots.txt logging is also useful e.g log any hit to the robots.txt file and for any BOTS that don't visit it before crawling your site. You can then make a decision to ban them for breaking your Terms Of Service for BOTS that should state they should obey your Robots.txt rules.

15) As BAD BOTS usually use the links in the DISALLOW section of Robots.txt to crawl anyway. Then putting a fake page in the list of URLs is a good idea. This page should be linked to from your site in a way that humans cannot see the link and accidentally visit it (and if they do it should have a Javascript link on it to enable them to get back to the site). However BAD BOTS will see the link in the source and crawl it. As they have broken your TOS and followed a URL in your DISALLOW list they are being doubly "bad", so you have the right to send them off to a honeypot (many exist on the web that either put emails out for them to extract then wait for an email to be sent to that address to prove they are an email scrapper bot) OR they get sent to an unbreakable maze like system which auto generate pages on the fly so that the BOT just keeps going around in circles crawling page after page and getting nowhere. Basically wasting their own bandwidth.

16) HTACCESS Rules in your .htaccess file should identify known bad bots as well as IE 6, 5 and 5.5 and send them off to a 403 page or a 404 so they don't realise they have been sprung. No-one in their right mind should be using these old IE browsers anymore however most downloadable crawlers used by script kiddies still use IE 6 as a user-agent for some reason. My guess is that they were written so long ago that the code hasn't changed or that people had to support IE 6 due to Intranets being built in that technology e.g using VBScript as the client side scripting language.

By using IE 6 as a UA they get access to all systems due to sites having to support that ancient horrible browser. However I ban blank user-agents, user-agents less than 10 characters long, any that contain known XSS/SQL injection vectors and so on, There is a good PHP Wordpress plugin called Wordpress Firewall that if you turn on all the features and then examine the output in your .htaccess file will show you some useful rules such as banning image hot linking that you can then nick for your own file.

17) Sending bad bots back to their own server is always a good trick so that they get no-where on your own site. Another good trick is to send them to a site that might scare the hell out of them once they realise they have been trying to hack or DDOS it or the METS Cyber Crime department.

These are just a few of the security measures I use to stop BOTS. It is not a comprehensive list but a good starting point and these points can be expanded and automated depending on who you think is visiting your site.

Remember most of these points are backed up with detailed articles on this site so have a search if anything spikes your interest.

Hope this helps.

By Strictly Software

© 2016 Strictly Software