Strictly Software: 2016

Wednesday, 5 October 2016

Disk Full - Linux - Hacked or Full of Log Files?

By Strictly-Software

This morning I woke up to find the symptoms of a hack attempt on my LINUX VPS server.

I had the same symptoms when I was ShockWave hacked a few years ago and some monkey overwrote a config file so that when I rebooted, hoping to fix the server, it would reload it in from a script hidden in a US car site.

They probably had no idea that the script was on their site either, but it was basically a script to enable various hacking methods and the WGet command in the config file ensured that my standard config was constantly overwritten when the server was re-started.

Another symptom was that my whole 80GB of disk space had suddenly filled up.

It was 30GB the night before and now with 30 odd HD movies hidden in a secret folder buried in my hard drive I could not FTP anything up to the site, receive or send emails or manually append content to my .htaccess file to give only my IP full control.

My attempts to clear space by clearing cached files was useless and it was only by burrowing through the hard drive folder by folder all night using the following command to show me the biggest files (visible and hidden) that I found the offending folder and deleted it.


du -hs $(ls -A)

However good this command is for finding files and folders and showing their size in KB, MB or GB, it is a laborious task to manually go from your root directory running the command over and over again until you find the offending folder(s).

So today when I thought I had been hacked I used a different process to find out the issue.

The following BASH script can be run from anywhere on your system in a console window and you can either enter a path if you think you know where the problem lies or just enter / when prompted to scan the whole machine.

It will list first the 20 biggest directories in order of size and then the 20 largest files in order of size.


echo -n "Type Filesystem: ";
read FS;NUMRESULTS=20;
resize;clear;date;df -h $FS;
echo "Largest Directories:"; 
du -x $FS 2>/dev/null| sort -rnk1| head -n $NUMRESULTS| awk '{printf "%d MB %s\n", $1/1024,$2}';
echo "Largest Files:"; 
nice -n 20 find $FS -mount -type f -ls 2>/dev/null| sort -rnk7| head -n $NUMRESULTS|awk '{printf "%d MB\t%s\n", ($7/1024)/1024,$NF}'

After running it I found that the problem was not actually a security breach but rather a plugin folder within a website containing log files. Somehow without me noticing the number of archived log files had crept up so much that it had eaten 50GB of space without my knowledge.

As the folder contained both existing and archived log files I didn't want to just truncate it or delete everything instead I removed all archived log files by using a wildcard search for the word ARCHIVED within the filename.


rm *ARCHIVED*

If you wanted to run a recursive find and delete within a folder then you may want to use something a bit different such as:


ind -type f -name '*ARCHIVED*' -delete

This managed to remove a whole 50GB of files within 10 minutes and just like lightening my sites, email and server started running again as they should have been.

So the moral of the story is that a full disk should be treated first as a symptom of a hacked server, especially if you were not expecting it, and the same methods used to diagnose and fix the problem can be used whether you have been hacked or allowed your server to fill itself up with log files or other content.

Therefore keep an eye on your system so you are not caught out if this does happen to you and if you do suddenly jump from 35GB to 80GB and stop receiving emails or being able to FTP content up (or files being copied up as 0 bytes), then you should immediately put some security measures into place.

My WordPress survival guide on security has some good options to use if you have been hacked but as standard you can do some things to protect yourself such as

Replacing the default BASH language with a more basic, older and secure DASH. You can still run BASH once logged into your console but as default it should not be running and allow hackers to run complex commands on your server.
You should always use SFTP instead of FTP as its more secure and you should change the default SSH port from 22 to another number in the config file so that standard port scanners don't spot that your server is open and vulnerable to attack.
If you are running VirtualMin on your server you should also change the default port for accessing it from 10000 to another number as well. Otherwise attackers will just swap from SSH attacks by console to web attacks where the front end is less protected. Also NEVER store the password in your browser in case you forget to lock your PC one day or your browsers local SQLLite Database is hacked and the passwords compromised.
Ensuring your root password and every other user password is strongly typed. Making passwords by joining up phrases or rememberable sentences where you swap the capitals and non capital letters over is a good idea. And always add a number to the start or end, or both as well as some special characters e.g 1967bESTsAIDfRED*_* would take a dictionary cracker a very long time to break.
Regularly change your root and other user passwords in case a keylogger has been installed on your PC and discovered them.
Also by running DENYHOSTS and Fail2Ban on your server you can ensure anyone who gets the SSH password wrong 3 times in a row is blocked and unable to access your console or SFTP files up to your server. If you forget yourself you can always use the VirtualMin website front end (if installed) to login and remove yourself from the DenyHosts list.
If you are running WordPress there are a number of other security tools such as the WordPress Firewall plugin that you can install which will hide your wp-admin login page away behind another URL and redirect people trying to access it to another page. I like the https://www.fbi.gov/wanted/cyber URL myself. It can also ban people who fail to login after a number of attempts for a set amount of time as well a number of other security features.

Most importantly of all regularly check the amount of free space you have on your server and turn off any logging that is not required if you don't need it.

Getting up at 5.30AM to send an email only to believe your site has been hacked due to a full disk is not a fun way to spend your day!

By Strictly-Software

© 2016 Strictly-Software

A Karmic guide for Scraping without being caught

Quick Regular Expression to clean off any tracking codes on URLS

I have to deal with Scrapers all day long in my day job and I ban them in a multitude of ways from using firewalls, .htaccess rules, my own personal logger system that checks for the duration between page loads, behaviour, and many other techniques.

However I also have to scrape HTML content sometimes for various reasons, such as to find a piece of content related to somebody on another website linked to my own. So I know both methods to use to detect scrapers and stop them.

This is a guide to various methods that scrapers use to prevent being caught and have their IP address added to a blacklist within minutes of starting. Knowing the methods people use to scrape sites will help you when you have to defend your own from scrapers so it's good to know both attack and defense.

Sometimes it is just too easy to spot a script kiddy who has just discovered CURL and thinks it's a good idea to test it out on your site by crawling every single page and link available.

Usually this is because they have downloaded a script from the net, sometimes a very old one, and not bothered to change any of the parameters. Therefore when you see a user-agent in you logfile that is hammering you that just has the user-agent of "CURL" you can block it and know you will be blocking many other script kiddies as well.

I believe that when you are scraping HTML content from a site it always wise to follow some golden rules based on Karma. It is not nice to have your own site hacked or taken down due to a BOT gone wild therefore you shouldn't wish this on other people either.

Behave when you are doing your own scraping and hopefully you won't find your own sites content appearing on a Chinese rip off under a different URL anytime soon.

1. Don't overload the server your are scraping.

This only lets the site admin know they are being scraped as your IP / Useragent will appear in their log files so regularly that you might get confused for trying a DOS attack. You could find yourself added to a block list ASAP if you hammer the site you are scraping.

The best way to get round this is to put a time gap in-between each request you make. If possible follow the sites Robots.txt file if they have one and use any Crawl-Delay parameter they may have specified. This will make you look much more legitimate as you are obeying their rules.

If they don't have a Crawl-Delay value then randomise a wait time in-between HTTP requests, with at least a few seconds wait as the minimum. If you don't hammer their server and slow it down you won't draw attention to yourself.

Also if possible try to always obey the sites Robot.txt file as if you do you will find yourself on the right side of the Karmic law. There are many tricks people use such as dynamic Robots.txt files, and fake URL's placed within them, that are used to trick scrapers who break the rules by following DISALLOWED locations into honeypots, never-ending link mazes or just instant blocks.

An example of a simple C# Robots.txt parser I wrote many years ago that can easily be edited to obtain the Crawl-Delay parameter can be found here: Parsing the Robots.txt file with C-Sharp.

2. Change your user-agent in-between calls.

Many offices share the same IP across their network due to the outbound gateway server they use, also many ISP's use the same IP address for multiple home users e.g DHCP. Therefore there is no easy way until IPv6 is 100% rolled out to guarantee that by banning a user by their IP address alone you will get your target.

Changing your user-agent in-between calls and using a number of random and current user-agents will make this even harder to detect.

Personally I block all access to my sites that use a list of BOTS I know are bad or where it is obvious the person has not edited the user-agent (CURL, Snoopy, WGet etc), plus IE 5, 5.5, 6 (all the way up to 10 if you want).

I have found one of the most common user-agents used by scrapers is IE 6. Whether this is because the person using the script has downloaded an old tool with this as the default user-agent and not bothered to change it or whether it is due to the high number of Intranet sites that were built in IE6 (and use VBScript as their client side language) I don't know.

I just know that by banning IE 6 and below you can stop a LOT of traffic. Therefore never use old IE browser UA's and always change the default UA from CURL to something else such as Chromes latest user-agent.

Using random numbers, dashes, very short user-agents or defaults is a way to get yourself caught out very quickly.

3. Use proxies if you can.

There are basically two types of proxy.

The proxy where the owner of the computer knows it is being used as a proxy server, either generously to allow people in foreign countries such as China or Iran to access outside content or for malicious reasons to capture the requests and details for hacking purposes.

Many legitimate online proxy services such as "Web Proxies" only allow GET requests, float adverts in front of you and prevent you from loading up certain material such as videos, JavaScript loaded content or other media.

A decent proxy is one where you obtain the IP address and port number and then set them up in your browser or BOT to route traffic through. You can find many free lists of proxies and their port numbers online although as they are free you will often find speed is an issue as many people are trying to use them at the same time. A good site to use to obtain proxies by country is http://nntime.com.

Common proxy port numbers are 8000, 8008, 8888, 8080, 3128. When using P2P tools such as uTorrent to download movies it is always good to disguise your traffic as HTTP traffic rather than using the default setting of a random port on each request. It makes it harder but obviously not impossible for snoopers to see you are downloading bit torrents and other content. You can find a list of ports and their common uses here.

The other form of proxy are BOTNET's or computers where PORTS have been left open and people have reversed engineered it so that they can use the computer/server as a proxy without the persons knowledge.

I have also found that many people who try hacking or spamming my own sites are also using insecure servers. A port scan on these people often reveals that their own server can be used as a proxy themselves. If they are going to hammer me - then sod them I say as I watch US TV live on their server.

4. Use a rented VPS

If you are only required to scrape for a day or two then you can hire a VPS and set it up so that you have a safe non-blacklisted IP address to crawl from. With services like AmazonAWS and other rent by the minute servers it is easy to move your BOT from server to server if you need to do some heavy duty crawling.

However on the flipside I often find myself banning the AmazonAWS IP range (which you can obtain here) as I know it is so often used by scrapers and social media BOTS (bandwidth wasters).

5. Confuse the server by adding extra headers

There are many headers that can tell a server whether you are coming through a proxy such as X-FORWARDED-FOR, and there is standard code used by developers to work backwards to obtain the correct original IP address (REMOTE_ADDR) which can allow them to locate you through a Geo-IP lookup.

However not so long ago, and many sites still may use this code, it was very easy to trick sites in one country into believing you were from that country by modifying the X-FORWARDED-FOR header and supplying an IP from the country of your choice.

I remember it was very simple to watch Comedy Central and other US TV shown online just by simply using a FireFox Modify Headers plugin and entering in a US IP address for the X-FORWARDED-FOR header.

Due to the code they were using, they obviously thought that the presence of the header indicated that a proxy had been used and that the original country of origin was the spoofed IP address in this modified header rather than the value in REMOTE_ADDR header.

Whilst this code is not so common anymore it can still be a good idea to "confuse" servers by supplying multiple IP addresses in headers that can be modified to make it look like a more legitimate request.

As the actual REMOTE_ADDR header is set by the outbound server you cannot easily change it. However you can supply a comma delimited list of IP's from various locations in headers such as X-FORWARDED-FOR, HTTP_X_FORWARDED, HTTP_VIA and the many others that proxies, gateways, and different servers use when passing HTTP requests along the way.

Plus you never know, if you are trying to obtain content that is blocked from your country of origin then this old technique may still work. It all depends on the code they use to identify the country of an HTTP requests origin.

6. Follow unconventional redirect methods.

Remember there are many levels of being able to block a scrape so making it look like a real request is the ideal way of getting your content. Some sites will use intermediary pages that have a META Refresh of "0" that then redirect to the real page or use JavaScript to do the redirect such as:


<body onload="window.location.href='http://blah.com'">


<script>
function redirect(){
   document.location.href='http://blah.com';
}
setTimeout(redirect,50);
</script>

Therefore you want a good super scraper tool that can handle this kind of redirect so you don't just return adverts and blank pages. Practice those regular expressions!

7. Act human.

By only making one GET request to the main page and not to any of the images, CSS or JavaScript files that the page loads in you make yourself look like a BOT.

If you look through a log file it is easy to spot Crawlers and BOTs because they don't obtain these extra files and as a log file is mainly sequential you can easily spot the requests made by one IP or User-Agent just by scanning down the file and noticing all the single GET requests from that IP to different URLS.

If you really want to mask yourself as human then use a regular expression or HTML parser to get all the related content as well.

Look for any URLS within SRC and HREF attributes as well as URLS contained within JavaScript that are loaded up with AJAX. It may slow your own code down plus use up more of your own bandwidth as well as the server you are scraping but it will disguise you much better and make it harder for anyone looking at a log file to distinguish you from a BOT with a simple search.

8. Remove tracking codes from your URL's.

This is so that when the SEO "guru" looks at their stats they don't confuse their tiny little minds by not being able to work out why it says 10 referrals from Twitter but only 8 had JavaScript enabled or had the tracking code they were using for a feed. This makes it look like a direct, natural request to the page rather than a redirect from an RSS or XML feed.

Here is an example of a regular expression that removes anything after the query-string including the question mark.

The example uses PHP but the expression itself can be used in any language.


$url = "http://www.somesite.com/myrewrittenpage?utm_source=rss&utm_medium=rss&utm_campaign=mycampaign";

$url = preg_replace("@(^.+)(\?.+$)@","$1",$url);

There are many more rules to scraping without being caught but the main aspect to remember is Karma.

What goes around comes around, therefore if you scrape a site heavily and hammer it so bad that it costs the user so much bandwidth and money that they cannot afford it, do not be surprised if someone comes and does the same to you at some point!

Tuesday, 23 August 2016

The Naming and Shaming of programming tightwads

Let the Shame List begin

Just like the News of the World when they published their list of paedophiles, nonces and kiddy fiddlers I am now creating my own list of shame which will publicly list the many people who have contacted me and done any of the following:

1. Asked for a new feature to be developed for one of my Wordpress plugins that only they required. Then once I have delivered the upgrade they don't even say "Thank You".

In fact 9 out of 10 times I don't even get the smallest of donations even when I have been promised them beforehand. I have lost count of the people who email me promising to donate me money if only I do this or that but when I do it they seem to forget how to click that big yellow DONATE button in the plugin admin page.

Do these people really think I live only to serve their useless coding skills by implementing features they themselves are too unskilled to develop or too tight to pay for? Is this really what people expect from Open Source code? I don't mind if you cannot code and add the feature or fix the bug yourself but if you can't then at least have the decency to donate some money for my time. Is that too much to ask for?

2. The other group of people (and there are many) are those who email me at stupid times throughout the morning 4am sometimes - demanding that I fix their site immediately due to my plugin "not working".

In fact 99 out 100 times it is usually the case that they have either been a numpty and not followed or understood the instructions, deleted all or some of the files or haven't set the relevant permissions up correctly.

Not only do I try and make all my Wordpress plugins easy to use for the non technical to use by outputting detailed error messages that explain what they must to do to fix the problem but most plugins have a "Test Configuration" button on them that will run all the necessary tests and then list any problems as well as fixes for them.

If these people cannot even read and understand error messages such as "Filepath does not exist at this location" because they have been silly enough to delete that file or folder then why should I offer free 24 hour support for them?

Here's an idea. If I email you back with steps to fix your incompetence - donate me some money.

Believe it or not I don't help people out for fun or offer free 24 hour support for FREE products.

You get what you pay for!

If you are too tight to offer to pay me to develop your custom feature or too tight to even donate the smallest amount when demanding (as I have had on numerous occasions) that I do X Y or Z by next Tuesday then why should I bend over to help you?

3. Then there are those companies (even some that have been big multi-nationals) that email me asking for relevant licences to be added to my downloadable scripts so that they can use them in their own projects. Probably projects that they will be making lots of money from by re-selling my code. Yet they refuse to donate even the slightest amount to the cause.

4. Finally and most important are the SEO Scammers, which you can read about more in detail here. They are advertisers who offer you money to post articles on your site yet when you do they then tell you that you will be paid in 20+ days. Why so long for so little money I have no idea. Yet on multiple occasions now I have been SEO Scammed where they fail to pay me my money but despite this, and me taking the article down. They have gained from the link juice passed along in links without rel="nofollow" on them and the site/domain authority.

It is very surprising how long PR Link Juice and authority stays around after the fact. Experiments we did showed that when setting up a pyramid system with one site at the top with zero links, and 100 or so with PR 4-6 all linking to that sites homepage. The top site zoomed up Googles rankings and even when we stopped the experiment the referrals from these sites (despite there being no links), stayed around in Google Webmaster Tools reports for months and months afterwards.

In future I am going to name and shame every person and company who carries out one of these actions on my blog.

There are many other places you can do this on the web, darkweb and even Facebook > https://www.facebook.com/SEOTRICKSANDCONS/. So it is worth checking these places for names, emails and company addresses before doing any work with them.

Also feel free to add your own cons and tricks and any funny emails from spammers and SEO "marketing" companies trying to get you to post articles or part with your money etc..

A basic tech review is also advised to see where the company is based with a WHOIS and DNS search.

It might help those other developers considering open source development to realise that it's a dead end that causes more hassle than it's worth. If you think your going to get rich developing scripts that can easily be stolen, downloaded, re-used and modified then you are living in a fantasy world.

Let the shaming begin.

Just to let you know, since I started this list I have had quite a few people donate money and I have removed their names from the list. I am not heartless and I don't want people searching Google for their own name to find this page first.

Therefore you know what to do, pay me the money you owe me or make a donation.

Sebastian Long - This was an advertiser who offered a measly £60 for putting up an article (not exactly much) on my racing site for the Goodwood Sussex Stakes held in July. I posted the exact article he wanted and even added extra SEO to help him but he didn't want any of that so I took it out. Once he was happy he said I would be paid within 20 days -it would have been nice for him to tell me this before hand but I am too trustworthy, although that is slowly diminishing.. It's so far been over 20 days (and 20 working days), and I have not been paid. I have contacted him multiple times and have now taken the article down. However his name and his company ellipticalcontent.com will remain on the numerous SEO / Advertiser blacklists that I put him on due to his lack of respect in honouring a very simple contract.

Kevin Clark - who did not know how to set up a sitemap - "press the build button" and wanted help "fixing" issues that were not broken which I duly gave out. No donation received.

Raymond Peytors - who asked about the now non supported pings to ASK or Yahoo - these have not been supported for sitemaps for a long time now. No donation received.

Mike Shanley - who did not seem to know how to read the "Read Me" text that comes with Wordpress plugins. On all my plugins I add a "Test Set-up" button which runs through the setup and displays problems and solutions for the user. The Read Me guide also explains how to run the Test when installing the plugin. Donation? Not on your nelly.

Juergen Mueller - For sending me an error message related to another plugin that he thought somehow was related to my plugin. This was due to the memory limit of his server/site being reached by said plugin.

Despite that he had all the details within the error message to fix it he still decided to email me for help. Despite me explaining how to fix the problem and steps he should do in future to fix problems I did not get a donation.

Holder Heiss - who even though he had read my disclaimer that said I don't give away support for free still asked me and received free help. He tried to motivate me to solving his problem with the following sentence

"I understand that you are cautious about giving free support for your free software. Anyway as I like using and would like to continue using the google sitemap plugin, maybe I can motivate you to have a look on this topic reported by several users:"

Even though he had not donated me any money I still checked the system, upgraded my software and looked for a problem that I could not find - probably related to another plugin. You get what you pay for and he got at least an hours worth of support for free!

Pedro Galvez Dextre - Who complained about the error message "Sitemap Build Aborted. The Server Load was 4.76 which is equal to or above your specified threshold of 0.9" and asked what was wrong????

Cindy Livengood - who couldn't be bothered to read the Readme.txt file as they "bored her" even though they contained an example post which would show if the plugin was working or not.

There have been many other people but I only have so much time to go through my inbox.

By the way if your name is on this list or appears on it in future and you would like it removed then you know what to do - a donate button is at the bottom of each plugin, on my website, on my blog and many other places.

Please remember people - I have had a serious illness and I am still in lots of pain. Therefore I have stopped supporting my Wordpress plugins for this reason PLUS the lack of donations I have received.

I work at a company where I am charged out at £700 a day therefore a donation of £10 is not going to make me work for a day or two on a plugin that is open-source and should be taken as such.

You get what you pay for and I wrote these plugins for myself not for anyone else.

I put them up on Wordpress to see if anyone else found them useful. If you do not like them then use another plugin.

If you want professional support then be prepared to pay for it. If not read on and follow these basic debugging steps and use Google. That's how I had to learn LINUX and WordPress!

As stated in my Readme.txt file of my Sitemap plugin: http://wordpress.org/extend/plugins/strictly-google-sitemap/faq/

I have an error - How to debug

If you have any error messages installing the plugin then please try the following to rule out conflicts with other plugins
-Disable all other plugins and then try to re-activate the Strictly Google Sitemap plugin
- some caching plugins can cause issues.
-If that worked, re-enable the plugins one by one to find the plugin causing the problem. Decide which plugin you want to use.
-If that didn't work check you have the latest version of the plugin software (from WordPress) and the latest version of WordPress installed
-Check you have Javascript and Cookies enabled.
-If you can code turn on the DEBUG constant and debug the code to find the problem otherwise contact me and offer me some money to fix the issue :)
-Please remember that you get what you pay for so you cannot expect 24 hour support for a free product. Please bear that in mind if you decide to email me.
A donation button is on my site and in the plugin admin page.
-If you must email me and haven't chosen to donate even the smallest amount of money please read this >> http://blog.strictly-software.com/2011/10/naming-and-shaming-of-programming.html
-If you don't want to pay for support then ask a question on the message board and hope someone else fixes the problem.

But I need this or that and your plugin doesn't do it

Sorry but tough luck.

I wrote this plugin for my own requirements not anyone else and if you have conflicts with other plugins or require extra work then offer to pay me to do the development or do the work yourself.

This is what Open Source programming should be about.

I wrote this plugin as other Sitemap plugins didn't do what I wanted them to and you should follow the same rules.

If you don't like this plugin or require a new feature you must remember that you have already bought a good amount of my time for the princely sum of £0.

Hopefully the starting of the shaming will stop the influx of emails I constantly receive asking for help without donations.

Remember every one of my plugins has a donate button at the bottom of it and you get what you pay for!

Tuesday, 9 August 2016

Fun with Dates! Web Server, SQL Server, IIS and ASP Classic - Problems and Solutions

By Strictly-Software

Dates can be a nightmare especially when moving servers.

A setup that ran perfectly on an old system can crumble up into a nightmare when the code and database is ported to a new server. I recently had this problem moving from a webserver connecting to database server to an all in one server based at the French hosting company OVH.

At first everything seemed okay then I started to notice the errors such as

1. Dates entered on a web page form as dd/mm/yyyy on submission coming back in US format e.g 22/08/2016 would come back as 8/22/2016 why there was no trailing zero I have no idea.

2. Primary Key / Index errors where the date was part of the key and it thought duplicates were being added into the system.

Good Practice

I always thought I ran good practice on my systems by doing the following but despite all these settings I was still getting US dates instead of UK dates shown on the website. However you should still do this as it limits the chances of error.

1. I ensure all Stored Procedures have SET DATEFORMAT YMD at the top and I store all dates as ISO format yyyy-mm-dd in the database.

2. All database logins used to pass information to the database are set to have "British English" as their "Default Language".

3. The database properties under options is set to British English.

4. The server properties under Advanced -> Default Language is set to British English.

5. On the website I always ask users and use client/server side validation to ensure they enter the dates as UK format dd/mm/yyyy.

6. I then convert that with a simple function into ISO format yyyy-mm-dd hh-mm-ss to pass to the stored procedure that saves the date. Having the SET DATEFORMAT YMD at the top of the stored procedure also helps to ensure SQL treats dates in that order Year - Month - Day etc.

I also always have on my site a Kill page which cleans out all session and application data e.g

<%
Application.Contents.RemoveAll()
Session.Contents.RemoveAll()
Session.Abandon()
%>

This is great if I need to wipe all stored info quickly especially as I have another page that gives me loads of Session, Application, Cookie and HTTP information. It also gives me much more including dates and times from the Web Server e.g NOW() and SQL Server e.g GETDATE() so I can check they are in sync.

I also show the default locale and currency formats. A simple version is below.

<%
response.write("<p>")
response.write("Default LCID is: " & Session.LCID & "<br>")
response.write("Date format is: " & date() & "<br>")
response.write("Currency format is: " & FormatCurrency(350))
%>

The LCID stands for the Server Locale and you can set it either in your Global.asa page if you use one or on the login page or in a global include that all pages use. English - United Kingdom has a value of 2057, the USA is 1033. You can read more about the locales here.

The result I got from my test page was

Default LCID is: 2057
Date format is: 09/08/2016
Currency format is: £350.00

This is all correct.

I have never had to resort to setting the Session.LCID on the website before and the Web Servers globalization settings were set the to the UK. This made it all the more stranger that I was getting US dates.

However I followed ALL of the steps below apart from the last step - which really is a last resort and it fixed the issue. It really is a shame that there isn't just one place where you can set your region and date formats that affects SQL and your Website but there just isn't.

Maybe a clever programmer could write something that would burrow away into all the servers settings and give you a report of what needs changing?

I already have a console .NET app that I run from a command prompt on any machine that tells me whether I can connect to the DB with specific ADO connection string params and LOCALHOST. It returns information about SQL logins, security such as access to extended stored procedures and any collation differences if there are any. It also shows me installed memory and disk space, used memory and disk space, all connected drives mapped drives and user logon history. Plus it attempts to send out an email using LOCALHOST as the mail relay using no ports and standard ports.

It also checks that it can access the outside world with a WebHTTP request to obtain the actual IP address of the machine before doing a PING to Google to test for speed. If I had the time I would probably love to delve into a project to solve the date issue as it's one that just keeps cropping up on new servers.

Debugging, Tests and Solutions

Apart from the good practices which I listed above and resorting to setting Session.LCID = 2057 at the top of your ASP pages there are some other things to try.

1. Test that your ISO dates are actually being stored as ISO NOT US e.g 2016-09-08 could be the 8th August in the UK (or ISO), or 9th of July if stored as US format. Do this with a simple SQL statement and login to your database using the connection string properties your website would not as an administrator.

This way you are using the properties of the users login and you can compare the results with you logged in as admin and if they are out of sync you should re-check your login properties again.

SELECT TOP(20) Racedate as ISODate,CONVERT(varchar, Racedate,103) as UKDate,datepart(day,Racedate) as DAYPart,datepart(month,Racedate) as MONTHPart,datepart(year,Racedate) as YEARPart
FROM   RACES
ORDER BY Racedate DESC

This should show you how the date is stored as a string (as all dates are really floating point numbers that would make no sense in a select if you saw them), as well as how the database sees each part of the ISO date.

So the DAYPart column should be the right section of the ISODate and left section of the UKDate and the month should be in the middle.

2. Test that your database is using a British format to return data to the client by running this SQL.


SELECT name, alias, UPPER(dateformat) as DateFormat
FROM syslanguages
WHERE langid =
 (SELECT value FROM master..sysconfigures
 WHERE comment = 'default language')

If everything is setup correctly you should get results like below:

Name - Alias - Dateformat
British - British English - DMY

If you don't get your desired country format back then the issue could be purely on the SQL Server / Database so go back over the good practices to ensure the Server/Database and Logins all have British English (en-gb) as their Default Languages / Locales in any setting you can see.

If you know that dates were entered for today and they look like ISO Date format in the tables then run a simple SELECT with a DATEDIFF(DAY,Stamp,GETDATE())=0 clause to see if they are returned.

If they are then you know the dates are being stored in the DB correctly so the issue is probably due to the web server.

3. Some people say that you should store your dates in 6 columns, Year, Month, Day, Hour, Minute, Second but personally I don't think this level of normalization is necessary OR should be.

However if you only have a couple of places to change then it might offer a solution. However if you already have a big database with lots of dates being shown and entered it would be a lot of work to normalize like this.

4. Go into your web servers Control Panel and select Language. Then ensure all possible date formats, languages and locales are set to use your desired location e.g chose UK instead of US and ensure the date formats are dd/mm/yyyy not mm/dd/yyyy.

There should be an "Advanced" link on the left where you can set the order of preference related to languages there as well. You will need to restart the machine for these to take affect.

5. Follow these steps from your Control Panel so that all users get to use your own regional settings. It's just one more place regions and formats are set which should be looked at if you are having an issue.

Go to Control Panel.
Click Region and you will see a screen with 3 tabs (Formats, Location and Administrative).
Click Formats and choose the settings you prefer.
Click Additional settings.
Click Date tab.
Change Short date to desired format and confirm dialog.
Click Location and choose the settings you prefer.
Click Administrative tab.
For "Welcome screen and new user accounts", click copy settings.
From the new window, click both checkboxes for "welcome screen and system accounts" and "new user accounts" (if you skip this step, you will still see the issue because IIS uses system account).
Approve all changes for Region by clicking OK on all open windows.
Open Command prompt, write iisreset and enter.
If you still don't see the changes try logoff and logon.
Or reboot.

6. Go into IIS and at Server level and Database level go into the .NET Globalization settings and change the options Culture and UI Culture to English (en). Even if you are using ASP classic it is still worth doing this.

7. Compare the webpages showing up the US dates in different browsers, FireFox, Chrome and IE for example. If there is a difference then it could be down to your browsers locale settings. You may have had an anti virus checker run and reset some browser properties without you knowing so it's worth a shot especially if you know that not everyone is seeing the same values as yourself.

8. If it really comes down to it and you cannot resolve the issue then you could wrap all your dates from any SQL returned to your web pages in a function that uses VBScript to split the day, month and year apart before putting them back together in the format you want.

An example is below. You can use the built in functions Day(val), Month(val), or Year(val) or DatePart("d",val) to get the part of the date out. This function also uses a quick easy way to append zeros onto single character numbers 1-9 become 01 or 09.

You will also see by this method whether or not the ASP code manages to select the correct part of the date out of the SQL column returned to you.

For example if you have a date of 12/08/2016 (where 12 is the day), and you use Datepart("d",dtDate), where dtDate is the variable holding 12/08/2016. Then you will see if you get back the correct value of 12 (UK) or 08 (US). If you get an issue like this then check all your web server settings.


Function CorrectDate(dtDate)
 Dim dDay, dMonth, dYear
 dDay = Datepart("d",dtDate)
 dMonth = Datepart("m",dtDate)
 dYear = Datepart("yyyy",dtDate)

 CorrectDate = QuickNoFormat(dDay) & "-" & QuickNoFormat(dMonth) & "-" & dYear
End Function



Function QuickNoFormat(intNo) 
 '* If the number is only 1 character long add a zero to the front e.g 8 becomes 08
 If IsNumeric(intNo) Then
  QuickNoFormat = Right(Cstr(Cint(intNo) + 100),2)
 Else
  QuickNoFormat = intNo
 End If
End Function

Hopefully by following the good practice guide you shouldn't run into problems but if you do the list of solutions should help you out. It did with my latest Windows 2012 server.

Let me know if this is of any help for you.

By Strictly-Software

© 2016 Strictly-Software

Friday, 8 July 2016

ISAPI URL Rewriting for ASP Classic on IIS 8

By Strictly-Software

I recently had to setup a dedicated server for some sites that we had to move from in-house hosting and outsource.

It was a move from Windows 2003 to a Windows 2012 server with IIS 8.

As usual the person setting up the system was as useful as a glass hammer and I had to spend ages learning things outside my job description just to get the system to work.

Not only was the web server side of things a pain but he copies databases with a backup/restore method which means having to re-link all the users and logins, re-create MS Agent jobs, set execute permissions, trustworthy settings and install CLR assemblies and handle collation conflicts etc. All things I could do without!

As everything is so costly for Windows Hosting, licences for everything, moving the Helicon ISAPI .httpd.ini file was a no no due to the fees. Luckily you can install for free the IIS URL Rewrite Module and use that to replicate any rules you may be using.

IIS 8 is a lot different from IIS 6.5 which I was working on before but once you get the IIS URL Rewrite 2.0 component installed from Microsofts website you will see it (after restarting IIS), in the bottom section of each site in your IIS panel.

You can then use the GUI interface to create the rules which is a bit cumbersome when you are used to just knocking out regular expressions in a text file.

However it does make it easier for people not as skilled at writing regular expressions as they can choose the type of expression from a drop down, rewrite or redirect or abort request, but you can use the "Test Pattern" tool to ensure your rule will work.

This article is a great guide for people wanting to set up rules using the interface and it shows you the output which is a web.config file placed in the root of your site. It doesn't matter if your site is .NET or ASP classic the web.config rule will work as long as .NET is installed and enabled in IIS.

This means you can easily open up the file and edit it when adding rules.

A simple example which shows you some of the rules you can do is below. Remember as it's an XML file you need to HTML Encode any characters that may malform the XML such as angled brackets. This is where using the GUI Tool is useful as it will auto encode everything for you and tell you if the XML is valid.

This example starts with a simple rewrite rule for SEO to make /promo go to the page /promo.asp and then it has an SQL injection example and an XSS injection example.

Obviously all input should be sanitised anyway but it doesn't harm to have multiple rings of security. At the end is a list of common HTTP libraries to ban. These are the sort of user-agent that scrapers and script kiddies use. They often download the tools off the web and don't know how to OR forget to change the user-agent.

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <system.webServer>
        <httpErrors errorMode="Detailed" />
        <rewrite>
            <rules>                
  <rule name="Promo SEO to Promo" stopProcessing="false">
                    <match url="^promo$" />
                    <action type="Rewrite" url="/promo.asp" />
                </rule>
  <rule name="Login Reminder SEO to Login" stopProcessing="false">
                    <match url="^loginreminder$" />
                    <action type="Rewrite" url="/logonreminder.asp" />
                </rule>
                <rule name="RequestBlockingRule1 SQL Injection" stopProcessing="true">
                    <match url=".*" />
                    <conditions>
                        <add input="{QUERY_STRING}" pattern=".*?sys\.?(?:objects|columns|tables)" />
                    </conditions>
                    <action type="AbortRequest" />
                </rule>
                <rule name="RequestBlockingRule1 XSS" stopProcessing="true">
                    <match url=".*" />
                    <conditions>
                        <add input="{QUERY_STRING}" pattern=".*?(<svg|alert\(|eval\(|onload=).*" />
                    </conditions>
                    <action type="AbortRequest" />
                </rule>                
                <rule name="RequestBlockingRule2" stopProcessing="true">
                    <match url=".*" />
                    <conditions>
                        <add input="{HTTP_USER_AGENT}" pattern=".*?(?:ColdFusion|libwww\-perl|Nutch|PycURL|Python|Snoopy|urllib|LWP|PECL|POE|WinHttp|curl|Wget).*" />
                    </conditions>
                    <action type="CustomResponse" statusCode="403" statusReason="Forbidden" statusDescription="Access Denied" />
                </rule>        
            </rules>
        </rewrite>
    </system.webServer>
</configuration>

As you can see I am aborting the requests for hackers and bad BOTs rather than returning a 403 status code in all but the last example, and I am just doing it there to show you how a 403 is carried out.

The syntax is slightly different from normal .htaccess rules due to being inside the XML file and the properties that are specified but in reality if you know regular expressions you won't go wrong.

By Strictly-Software

© 2016 Strictly-Software

Saturday, 18 June 2016

Why just grabbing code from the web can lead to major problems down the line

By Strictly-Software.com

I have wrote many articles over the years about server, system, website and PC performance, and it seems that the more versions of FireFox and Chrome that come out, the slower they get. I don't think I have ever used IE 11 as much as I have in the last 3 months. Mostly just to get Facebook, Radio 1 or Google+ to load within a minute which FF and Chrome seem to have issues with for some reason.

Some add-ons like uBlock Origin prevent 3rd party domain code from being loaded up on the site as well as large image or video/flash objects. It also stops pop-up windows and the loading of remote CSS fonts which is all the craze now.

What the developers of these websites don't seem to realise is that when they are loading in code from all over the web just to make a page display or run it causes a lot of network traffic. It also introduces the possibility that the code at the end source has been tampered with and therefore you could be loading in Cross Site Scripting hacks or ways for people to exploit your site if that certain script exists in the DOM.

Also a less likely scenario but a more common issue is that the more domains your site has to access to get all it's code onto the site, it can mean the page doesn't load as you may want it to, or even not at all.

If script A relies on Script B but Script B doesn't load for a long time then the code in Script A that was going to open a popup window on DOM Load, or play a video just isn't going to work.

I recently overrode the Window.OnError event and logged the Message, URL and Line No with an AJAX call to a log file before either throwing the error for modern sites or hiding it for older ones.

When I started looking through these files the amount of Google AdSense and Tracker scripts not loading due to timeouts is incredible. Also there are issues with bugs in the scripts or due to their slow loading objects not being available for other scripts relying on them to use. An example of just one error is:

24/04/2016 09:54:33 : 8X.XXX.XXX.161 'document.body' is null or not an object in http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js on line 19

People relying on Google for stats shouldn't for a number of reasons. Not only do they not always load and record the visit, but they also rely on 3rd party cookies being enabled and JavaScript being enabled. A Log parser or DB is a much better way to log every single visitor BOT or Human.

For example if you have a main jQuery script you are loading in from a CDN or from a site you don't control, if that domain is having network problems then that means any other code on the site reliant on it won't be able to work until that issue is resolved. This happens a lot from viewing the messages in my JavaScript error log file.

Due to this a lot of people just grab the code off the net and load it in from a local server to get round network delays.

However by doing this they are stuck in a point of time (the date and the version they copied the file at). I hate this, as instead of actually learning JavaScript so they know what they are doing they are relying on some other blokes framework to solve their problems e.g have a look at whose code most of you are building your site with. If there is a bug in jQuery you either have to fix it yourself or wait for John to fix it. If it's your own code at least you can rely on your own skills and know how the code works.

The other day I had to solve a jQuery problem where the page in question was using an old version of jQuery and another 3rd party script built around jQuery (but not by John), called reveal.js.

As the front end developers wanted to move to the latest version of jQuery they suddenly found that the reveal.js code no longer worked.

After debugging it was clear that the $().live(function) had been removed and as the code that did the popup relied on reveal.js and it was built in 2011 with no recent updates. The whole revealing and hiding of modal boxes stopped as soon as a modern version of jQuery was loaded in for the site.

I had to waste time reading up on jQuery and then hardcoding the version of reveal.js as we had to use the new .on() function so that the new jQuery libraries would work with the old code that was taken from a library developed in 2011.

This is one thing I hate about front end developers who just pick n choose libraries off the web despite them all doing the same thing like event binding and removal multiple times in multiple ways.

If they are relying on a 3rd party library they took from 2011 that also relies on a constantly updated framework like jQuery that is always dropping and adding new methods, then how are people to expect sites to work when a method these libraries rely on are removed?

If they cannot write some basic notes to say that this page relies on this script e.g reveal.js, which came with jQuery 1.4.5 then it makes people like me who hate debugging other peoples frameworks hate 3rd party code even more.

Not only do I have my own getme.js framework which is simple, uses CSS selectors, linked methods where the array of objects is passed down from function to function, but now that most browsers support the simple one line of code that allows for selectors to find objects there is no need to add Sizzle.js to it any-more. Not unless you really want to support old IE versions you can just use this single line.


// where query is the CSS selector
document.querySelectorAll( query );

For example in my Getme.js code this following line of code will loop through all Anchor nodes with a class of menu on them inside the DIV with the ID MAIN. I just then alert out the elements ID.


G('DIV#Main > A.menu').each(function(){
   alert(this.id);
})

Obviously if you do all your styling in CSS or inline JS you have the option of how to style a series of objects for example with the .setAtts method you can pass in any element attribute and their values.

This is providing a mixture of a class and inline styles to the Paragraphs inside DIV tags. It also uses chaining where the array of object are passed from one function to the next just like other frameworks.

The first example just looks for DIV tags with P's inside and sets the class to "warningRed" and the style of the font to bold and red. The class can do most of the styling or ALL of it.

It's just an example, so is the 2nd one that shows all P tags with a SPAN with the class "info". Inside it gets a warning message with the .setHTML method and then the .setStyle method colours the text.


G('DIV > P').setAtts({class:"warningRed", style:"color:red; font-weight:bold"});

G('P > SPAN.info').setHTML('CLick for help.').setStyle({color:red, fontSize:8px});

I used a G instead of $ just to distinguish it from all the other frameworks and because it's called Getme.js.

If you want to know how to learn to write your own chainable framework then have a read of this article of mine. I've kept Getme.js simple as I hate people who just copy code from the web especially when it goes wrong.

At least this way I have a wrapper object that allows for chaining and the setting of multiple attributes at once and the use of selectors. However I still like to use pure JavaScript inside my functions so people down the line can get their heads around it.

So next time I get a jQuery problem because John Resig has decided to remove a core function from his framework which then causes a chain re-action due to all the other frameworks that were built around that version of jQuery, I can at least (hopefully) use my simple framework to apply the CSS that the designers need to rather than spend a day hunting around for fixes to other people's code.

That, is something I really hate doing.

By Strictly-Software.com

© 2016 Strictly-Software.com

Don't Be Fooled By "Turbo Boost" and Windows Performance / Cleaner Applications

By Strictly-Software.com

I bet if you have been online for a more than a few times you will have undoubtedly seen adverts for tools and applications that will "Speed up your computer" or "Tune it up", "remove unnecessary files" and even malware.

Most of these apps are con tricks in that they will run, show you a really high number of problems either to do with security, privacy or performance and when you go to fix them you are told you must pay a fee of £29.99 to get the full version.

Scam code I call it.

Mainly because people don't know what half the items that are recorded as security holes or performance issues are. For example to get a nice big list of privacy concerns about 20,000 they might list every single cookie you have from every browser.

If you don't know what a cookie is it it's a harmless small text file that holds very small information about your visit to the site e.g by linking your username to a member ID so that the next time you visit the site you don't have to keep re-typing your username in the login box.

For example if you install the Web Developer Toolbar on FireFox you can view all the cookies on a site, domain including sessions. Viewing the cookies for this site I see one that gives me this really important information....

Name: SNID
Value: 72=i-mBmgOp22ixVNh68LucZ_88i1MnYk0FkV2k8k3s=uNr4G5YjLe6X9iAQ
Host: .google.com
Path: /verify
Expires: Mon, 11 Apr 2016 16:43:43
GMT Secure: No
HttpOnly: Yes

I have no idea what the cookie value for SNID means and most people apart from the web developers won't so when people try and scare you with "cookies are dangerous" - something I have heard from my parents many times - just ignore their ignorance of web development.

They just need to realise that unless your password is stored in a plain text cookie (which never happens) then you don't have much to fear from cookies at all. They just fill up your local data directories the more sites you visit.

The one thing you may not like are tracking cookies e.g Google who try and track you from site to site to see what kind of information you are interested in so that they can show you relevant adverts.

Turning off 3rd party cookies in Chrome or the browser of your choice and setting DNT (Do Not Track) to YES/ON is worth doing even if some browsers don't support the DNT header.

Turbo Mode

Turbo mode is one of those cool sounding options that seem to signal that just by pressing the Turbo ON button your whole machine will speed up. In reality it does a few things, many of which might not even be happening at the time you press it.

These include:

-Stopping a scheduled de-fragmentation of your hard disk. Something that is rarely needed or used anyway but does consume memory and CPU if running.
-Stopping any scheduled tasks from running. These could be updates, downloads of applications that require updates and the automatic creation of system backup and restore points.
-Postpone the automatic download and installation of important application and Windows updates.

You will be informed about the postponing of downloads and automatic updates such as Windows Updates if enabled.

In reality it doesn't do much but sounds and looks good when it says it has boosted your systems performance by 25% etc. Just beware that there is no way of it really knowing how much it has helped and it is probably negligible anyway.

If you really want to speed up your PC, open the task manager, enable the show all processes option and then order the results by CPU or Memory. The programs at the top using over 1GB should certainly be looked at and may have memory leaks.

A shut down of those applications and then re-opening of them might help you out a lot. I find some apps like MS SQL 2015 really drain my memory if I leave them on for days and a reboot now and then is the best remedy for most problems.

It may be a joke from the IT Crowd to "Turn it on and off again", but in reality that does solve a hell of a lot of problems with computers running high memory or CPU.

Always try and install Windows updates regularly so you are not waiting around hours for those 64 updates to install like I have a number of times due to keep hitting the "Remind me in 15 minutes" button. A reboot with the most up to date software is the best thing you can do for your PC as well as removing applications and plugins for browsers that you never use.

The more unnecessary applications you have on your system the more apps you will find in your Windows Start Up options running just to monitor for updates. Google does it, iTunes does it, and many other programs do as well. The more you can trim your system down so it only uses what you want it to use the better.

Plugins on browsers that were only used once should be removed afterwards.Regularly check if you are actually using all the browser plugins as when they are updated the old versions are hardly ever removed.

Applications you downloaded to do one task should also be uninstalled before you forget about them.

The leaner the machine the quicker the machine. I have a 16GB RAM 64GB Windows box at work and I regularly hit 12/13GB of memory. I usually know this is happening because the radio cuts out. However as I hate closing everything down, waiting for the installations and then trying to remember what I had open at the time I tend to let the memory rise and rise and then get frustrated as everything slows down.

If someone could invent a program that would remember what was open and then after rebooting re-open every app, file (with text), and program that was running before would make a mint. If something like this already exist PLEASE TELL ME WHERE I CAN FIND IT!

Clean your PC manually

This part of the article shows you how these myriad of application cleaner tools which trick you into paying money to speed up your PC are basically useless. Even tests have proved that running the following Windows 8+ built system applications can be just as affective.

Use the built in Disk Cleanup tool included with Windows. It’s focused on freeing up space on your hard drive, but it will also delete old temporary files and other useless things. Just tap the Windows key, type Disk Cleanup, and press Enter to launch it. You can even schedule a Disk Cleanup to clean your computer automatically.

When the tool pops up it will list a number of folders and system folders containing files that build up over time the more you use your PC.

Whilst this might be good in regards to browser cache when you are constantly going to the same sites over and over again as it means the photos and other files are locally stored on your computer preventing a network look up to download them again, these are files that you probably use once and forget about. This causes the folder size to rise and rise slowing down access. If you don't go to the sites often enough for a browser cache to be useful then clean it out. A tool like CCleaner can let you decide which sites get cleaned and which others don't.

Remember to regularly clean the following:

Your downloaded folder, apps, videos and other files that you have then installed or watched and no longer need.
Device Driver Downloads after installation.
Empty the Recycle Bin
Clean the System Error and Memory Dump Files
Delete Temporary Files
Delete User File History

There are tools that are free that help you do all this, backing up your PC before the deletions in case something goes wrong. We will look at CCleaner in a bit.

So if you don't want to rely on costly tools that try and trick you into paying money to make you feel safe there are plenty of ways around it.

1. Don't be tricked by the salesperson at PC World who promises you McAfee Anti Virus software is the best way to protect your PC. It's insurance, and they get the money - a bonus to the sales person so to speak.

There is no need to waste money on a tool that will kill your CPU by constantly scanning every single file your computer accesses (which is a lot), when there are free tools like MalawareBytes Anti-Malware which can be downloaded for free online. There is a premium version if you do require constant analysis of every file your PC comes in contact with but I haven't found it to be needed.

Just run a scan once a week and make sure to never open .ZIP, .EXE, .DOCX or .PDF files in emails especially when you are not expecting them and they are from people you don't know.

Also please remember that is VERY EASY to fake the "FROM" address in an email (1 line of code), so if your a member of a site and someone sends you a flashy looking email that seems to be from PayPal, Facebook or your bank with the address admin@facebook.com do at least a few things before opening the file.

1. Open the full email headers so that you can see the original sender of the email. Is it from Facebook or your bank?

2. If you are not sure as it's an IP address e.g 134.1.34.248 then run that in a command prompt with the line >> nslookup 134.1.34.248 and make sure it returns a known address. If it comes back empty or with an unknown name e.g RuskiHCKER.com use an online Whois tool (there are lots online), or if you have installed WhoisCL on your Windows computer type whoisCL RuskiHCKER.com and see what the WHOIS details return about the owner of the address. It should tell you what country it's from and an email address to complain to if you are being spammed by it.

3. If the HTML email looks fancy like your bank or Facebook or some other site. Move your mouse over some of the bottom links in the footer or side bar. Most site strippers will only bother putting code behind the main buttons so they can log your typing e.g Login, Password, Forgot Password etc. If you roll your mouse over the "About" or "Help" links and all you see is a # instead of a proper URL then that is suspicious. Delete the email ASAP!

Remember banks never ask you for your PIN code so never trust a site asking you for that. Also if it asks you for information about your mothers maiden name, first pet, first school, favourite colour and other information used to verify you by sites you should shut it down ASAP.

4. If the headers look okay it could still be a hacked mailserver or a man in the middle attack so right click the file and if you installed Malaware properly you should be able to run a virus scan over the file with one click before saving or opening it. If you can't then save it to your computer and run a virus check on the file before opening it. Never just open the file whoever you may think it's from.

Regularly clear your browser history or even better, set your browser to automatically clear its history when you close it if you don’t want to store a history or even better just use the browsers secret browsing options e.g Chrome's is called Incognito and allows you to surf the web without leaving a history or storing cookies on your machine.

Also clear your browser cache every now and then. Whilst a cache is good for quick loading of images and files (JS, CSS, JPEGs) that are used often. Once it becomes too large then it gets slower and slower to find those files you need so it negates the usefulness of it due to it's size.

Run the Disk Defragmenter included with Windows. This isn't necessary if you use an SSD or solid-state drive.

Don’t bother with a registry cleaner or other performance tool if you have to pay for it. If you want an application to help you then CCleaner is that tool.

You can download from here: CCleaner, The good thing about it, is that it's the best-tested registry cleaner out there.

I always run a registry clean after removing applications from my computer to ensure any registry keys and file extensions left over are also removed. CCleaner will also delete your browser cache for all the browsers you use, as well as cookies, saved passwords, web history and temporary files for other programs.

You have the choice to tick what you want to clean and what not to clean but the free tool CCleaner does a lot more than many of these PC cleaning apps do. A test performed in 2011 by Windows Secrets found that the Disk Cleanup tool included with Windows was just as good as paid PC cleaning apps.

Note that this is true even though PC cleaning apps fix “registry errors” while the Disk Cleanup app doesn't, which just shows just how unnecessary registry cleaners are. So don't waste money being "blackmailed" into buying the premium version of these clean up tools.

So yes, it’s been tested, PC cleaning apps are worthless. Tune your PC yourself and you will get better results.

If you want to download CCleaner which is the recommended tool that professionals use then you can get it from here www.piriform.com/ccleaner/download.

By Strictly-Software.com

© 2016 Strictly-Software.com

Tuesday, 17 May 2016

Stopping BOTS - A Multi Layered Approach

By Strictly Software

Some people don't mind BOTS of all shapes and form roaming their sites but if you actually look into what they are doing should you be worried about their actions?

Have you examined your log files lately to see what kind of BOTS are visiting and how much bandwidth they are using?

Here are a few of the reasons you might want to care about the type of actions carried out by automated crawlers (BOTS):

1. They eat bandwidth. Social media BOTS especially who jump onto any link you post on Twitter causing Twitter Rushes. This is where 50+ BOTS all hit your site at the same time and if you are not careful could use up all your memory and cause a frozen system if not configured properly. There are plenty of articles about Twitter Rushes on this site if you use the search option down the right hand side to find more details.

2. Bandwidth costs money. If you are a one man band or don't want high server costs then why would you want social media BOTS, many that provide no benefit to you, costing you money just so they can provide their own end users with a service?

3. Content theft. If a user-agent identifying itself as IE6 is hitting a page a second is it really a human using an old IE browser visiting that many pages? Of course not. However for some reason IE 6 is the most popular user-agent used by script kiddies, scrapers and hackers. Probably because they have just downloaded an old crawler script off the web and run it without the knowledge to edit the code and change the agent. Look for user-agents from the same IP hitting lots of pages per minute and ask yourself are they helping your business or just slowing your site down by not obeying your robots.txt crawl-delay command?

4. Hacking. Automated hackbots scan the web looking for sites with old OS systems, old code and potential back doors. They then create a list of sites for their user and come back to penetrate these sites with SQL/XSS injection hacks. Some might show up in GET requests in the log file but if they are tampering with FORM elements then any POSTED data containing hack vectors won't show up. Hiding key response parameters such as your server brand and model and the scripting language you use are good simple measures to prevent your sites name ending up on this list of potential targets to hack and can easily be configured in config files on your system.

Therefore you should have a defence against these type of automated BOTS. Of course you also have the human hacker who might find a sites contact form, view the source, tamper with the HTML and work out a way to modify it so he can send out mass emails from your server with a custom script. Again security measures should be implemented to stop this. I am not going to talk about the basics of security when it comes to preventing XSS/SQL injection but the site has many articles on the topic and basic input sanitation and database login security measures should stop these kinds of hack.

So if you do want to stop automated BOTS from submitting forms, registering to your site, applying for jobs and anything else your site might do the following list might be helpful. It is just an off the head list I recently gave to someone on LinkedIn but could be helpful if expanded to your own requirements.

On my own sites I use a multi pronged approach to stop BAD BOTS as well as bandwidth wasting social media BOTS, hack bots and even manual hackers tampering with the forms. It saves me money as well as increases performance by allowing legit users only to use the site. By banning over 50% of my traffic which is of no benefit to me I can give the 50% of useful traffic a better user experience.

1) We log (using Javascript), whether the user has Javascript enabled e.g an AJAX call on the 1st page they hit that sets a session cookie using Javascript. As most BOTS don't use Javascript we can assume if they have Javascript enabled they are "probably" human.

2) We also use Javascript (or the 1st page HTTP_ALL header in IE) to log whether Flash is enabled and the version. A combo of having Flash running and Javascript isbetter than just Javascript on it's own.

3) I have my own logger DB that records browser fingerprints and IP's, Useragent, Javascript, Flash, HTTP settings, installed apps, browser extensions, Operating System and other features that can almost uniquely identify a user. The problem is of course an IP often changes either through DCHP or the use of proxies, VPN's and hired VPS boxes for an hour or two. However it does help in that I can use this combination data to look up in my historical visitor database to see what rating I gave them before e.g Human, BOT, SERP, Hacker, Spammer, Content Thief and so on. That way if the IP has changed but the majority of the browser finger print hasn't I can make an educated guess. If I am not 100% sure however I will then go into "unsure mode" where security features such as CAPTCHAS and BOT TRAPS are introduced just in case. I can then use Session variables if cookies are enabled to store the current status of the user (Human, BOT, Unknown etc), or use my visitor table to log the browser footprint and current IP and do lookups on pages where I need to use defensive measures if cookies are not enabled.

4) These Session/DB settings are then used to decide whether to increment banner hit counters, write out emails in images or with Javascript so that only humans can see them (to prevent BOT email scrapers), and other defensive measures. If I know they are 100% human then I may chose not to deploy these measures.

5) On forms like contact forms I often use BOT Traps. These are input elements that are in the flow of the form with names like email_extra that are hidden with CSS only. If the BOT submits a value for this hidden input I don't submit the form, or I do but without carrying out the desired action and not let the BOT know that nothing happened.

6) A lot of forms (especially contact forms) can be submitted by just entering an email address for all fields (name, email, password etc). Therefore I check that the field values are different e.g not the same value for an email AND password field. I also ensure the name matches a name pattern with a regular expression.

7) I have built my own 2 stage CAPTCHA system which can be turned on or off on the fly for forms where I don't know if the user is 100% human OR I can decide to just always have it on. This is based around a maths question, where the numbers are in 3 automatically created images, grey and blurry like normal CAPTCHA's The user has to first extract the right numbers from the images then carry out an automated sum from those numbers e.g add number 1 to number 2 and deduct number 3. This works very well as it requires a human brain to interpret the question and not just use OCR techniques to extract the CAPTCHA image values. There are so many OCR breakers out there that a standard CAPTCHA where you enter the word on the picture can easily be cracked automatically now.

8) If there is textarea on the form, contact, application etc, then I use my RUDE word table which has hundreds of variants of rude words and the regular expression next to it to detect them. This can obviously be updated to include pharmacy pill names, download movies, porn and other spam words.

9) I also have a number of basic regular expressions if the user wants light detection that checks for certain strings such as "download your xxx now", "buy xxx for just $£", and words like MP3s, Films, Porn, Cialis and other common spam words that would have no place on a site not selling such goods.

10) I always log any blocking so I can weed out any false positives and refine the regular expressions etc.

11) I also have an incremental ban time so the 1st time anyone gets banned is for 1 hour, then 2, then 4 then a day etc etc.The more times they come back the longer they get banned.

12) Sometimes I use JavaScript and AJAX to submit the form instead of standard submit buttons. As Javascript is so commonly used now (just look at Google), then most people have it enabled otherwise the majority of sites just wouldn't work or would have minimum features. It would require a human hacker to analyse your page to break it and then write a custom BOT just to hack the form when a technique like this is used. To get round this you can use a rolling random key created server side, inputted into a hidden element with Javascript on page load and then examined on form submission to ensure it is correct. If it's not then the person has tampered with the form by entering an old key not the new key and can be banned or blocked.

13) Another good way to stop automatic hack BOTs (ones that just roam the web looking for forms to try and submit and break out of to send emails etc - contact forms), is to not use FORM tags in your server side code but have compressed and encrypted JavaScript that on page load converts the <div id="form">....</div> into a real FORM with an action, method etc. Anyone viewing the non generated source code like most BOTS, won't see a FORM there to try to hack. Only a generated HTML source view (once the page has loaded), would show them this, which most BOTS would not be able to view.

14) Honeypots and Robots.txt logging is also useful e.g log any hit to the robots.txt file and for any BOTS that don't visit it before crawling your site. You can then make a decision to ban them for breaking your Terms Of Service for BOTS that should state they should obey your Robots.txt rules.

15) As BAD BOTS usually use the links in the DISALLOW section of Robots.txt to crawl anyway. Then putting a fake page in the list of URLs is a good idea. This page should be linked to from your site in a way that humans cannot see the link and accidentally visit it (and if they do it should have a Javascript link on it to enable them to get back to the site). However BAD BOTS will see the link in the source and crawl it. As they have broken your TOS and followed a URL in your DISALLOW list they are being doubly "bad", so you have the right to send them off to a honeypot (many exist on the web that either put emails out for them to extract then wait for an email to be sent to that address to prove they are an email scrapper bot) OR they get sent to an unbreakable maze like system which auto generate pages on the fly so that the BOT just keeps going around in circles crawling page after page and getting nowhere. Basically wasting their own bandwidth.

16) HTACCESS Rules in your .htaccess file should identify known bad bots as well as IE 6, 5 and 5.5 and send them off to a 403 page or a 404 so they don't realise they have been sprung. No-one in their right mind should be using these old IE browsers anymore however most downloadable crawlers used by script kiddies still use IE 6 as a user-agent for some reason. My guess is that they were written so long ago that the code hasn't changed or that people had to support IE 6 due to Intranets being built in that technology e.g using VBScript as the client side scripting language.

By using IE 6 as a UA they get access to all systems due to sites having to support that ancient horrible browser. However I ban blank user-agents, user-agents less than 10 characters long, any that contain known XSS/SQL injection vectors and so on, There is a good PHP Wordpress plugin called Wordpress Firewall that if you turn on all the features and then examine the output in your .htaccess file will show you some useful rules such as banning image hot linking that you can then nick for your own file.

17) Sending bad bots back to their own server is always a good trick so that they get no-where on your own site. Another good trick is to send them to a site that might scare the hell out of them once they realise they have been trying to hack or DDOS it https://www.fbi.gov/wanted/cyber or the METS Cyber Crime department.

These are just a few of the security measures I use to stop BOTS. It is not a comprehensive list but a good starting point and these points can be expanded and automated depending on who you think is visiting your site.

Remember most of these points are backed up with detailed articles on this site so have a search if anything spikes your interest.

Hope this helps.

By Strictly Software

© 2016 Strictly Software

Friday, 29 April 2016

Chome and FireFox really getting on my tits....

By Strictly-Software.com

Chome and FireFox really getting on my tits....

Chrome was by browser of choice, due to being light weight and fast.

FireFox was in 2nd place due to the range of plugins available.

I had relegated IE into usage only to test code for cross browser compatibility issues.

However I am finding that I am actually using Internet Explorer more and more due to constant issues with both of the latest versions of these browsers.

I am running Chrome: 50.0.2661.75 (64-bit) And FireFox 46.0 buld no: 20160421124000 (64 bit) on all 3 of my machines (Win7 & Win 8.1)

There was a stage when both these honey's were humming like a bees. I even put up some articles on how to improve the speed on both browsers:

Speeding Up Chrome Can Kill It
Speeding up Google Chrome with DNS Pre-Fetching
Performance Tuning FireFox

I also put up a general PC and Browser tune up article with free tools, command line prompts and some basic things to try if you had a slow computer: Speeding up your PC and Internet connection.

However I have even found myself using IE 11 more and more due to constant hanging, pages not loading at all with the "processing request" message in the footer, or waiting for some 3rd party non asynchronous loaded in script, to download and run that blocks the site or page from running.

I think there is a far too much "API JIZZ" in the community at the moment.

What this means is that developers, due to their nature to impress and gold plate code, even when the spec doesn't call for it, are now using so many 3rd party and remotely hosted plugins like jQuery, Google Graphs, tracker code, plus loads of funky looking CPU consuming widgets to make their pages look good.

You only have go into Facebook or G+ and try and write a message. Not only will Google Plus's new post box move around the page before you can start writing, but both websites are constantly analysing your keystrokes to find out if the previous string matches a contact, community or page, in your contact book for them to link to.

The more people and pages you have stored the slower this process becomes. Yes is might be handy but why not just require a symbol like + in Google+ to be put before the person name so that the code only checks that word for a relation.

Imagine having a list of thousands of pages, liked communities/pages and contacts to be constantly checked on every keydown press with AJAX requests. That is overkill. It slows down systems .

I still have two windows from Chrome spinning away for (Google Blogger blogs) at the moment. There is not much 3rd party code on these pages but they are having trouble and showing common "Waiting for Cache" and "Processing Request" messages in the status bar.

This is the same sort of thing I get in FireFox. Although in this browser, what kills me is just the slowness of getting from page to page. On many sites I have to refresh it multiple times before the code all loads and this goes for online banking to online betting sites. Just trying to watch a race on their Flash screens is a nightmare.

I had a bet on a horse the other day on Bet365.com just so I could watch the big race with an unbeaten in 11 straight wins, Douvan, running. However Bet365.com video didn't start and in SkyBet it was stuttery and kept losing picture and sound. I missed the end of one race where a horse I had backed jumped the last fence into the lead but when the picture came back it had finished 3rd!

They keep telling me to clear the cache, reboot the router and do speed tests. Things I have done many times. I have 54Mbps download speed at work and 28Mbps at home. I can stream 4k UHD TV to multiple screens so download speed is not the issue something else is.

Speedof.me is the best online speed testing site I have found as it as it uses no extra files and is ran in pure HTML5 with no Flash, Java or ActiveX type objects requiring to be loaded for it to run.

What is causing the problem I have no idea as my broadband speed seems okay. I suspect it's the large number of reverse proxies being used and the download of shared 3rd party scripts and widgets that can hang due to a large number of HTTP requests.

I tried deleting my userdata file for Google by searching for it in the address bar of Windows Explore with this line: %USERPROFILE%\AppData\Local\Google\Chrome\User Data

I have also tried disabling Flash as so many times I see the "An object has crashed" bar in the header that is related to the Flash Container object failing. Sometimes a reload works other times it doesn't.

However so many sites STILL use Flash it is hard to live without it really. For example the WHOLE of Bet365.com is made in Flash which makes it very user unfriendly and hard to use with sticky scrollbars and issues with selection of items.

If anyone has similar issues or ideas on resolving them let me know, as I never thought I would be going back to IE to use as my main browser!

By Strictly-Software.com

©2016 Strictly-Software.com

Wednesday, 5 October 2016

Disk Full - Linux - Hacked or Full of Log Files?

Disk Full - Linux - Hacked or Full of Log Files?

A Karmic guide for Scraping without being caught

Tuesday, 23 August 2016

The Naming and Shaming of programming tightwads

Tuesday, 9 August 2016

Fun with Dates! Web Server, SQL Server, IIS and ASP Classic - Problems and Solutions

Fun with Dates! Web Server, SQL Server, IIS and ASP Classic - Problems and Solutions

Good Practice

Debugging, Tests and Solutions

Friday, 8 July 2016

ISAPI URL Rewriting for ASP Classic on IIS 8

ISAPI URL Rewriting for ASP Classic on IIS 8

Saturday, 18 June 2016

Why just grabbing code from the web can lead to major problems down the line

Why just grabbing code from the web can lead to major problems down the line

Don't Be Fooled By "Turbo Boost" and Windows Performance / Cleaner Applications

Don't Be Fooled By "Turbo Boost" and Windows Performance / Cleaner Applications

Turbo Mode

Clean your PC manually

Tuesday, 17 May 2016

Stopping BOTS - A Multi Layered Approach

Stopping BOTS - A Multi Layered Approach

Friday, 29 April 2016

Chome and FireFox really getting on my tits....

Chome and FireFox really getting on my tits....

Who is Strictly-Software?

My Stuff

Settings

Sites to Visit

Strictly-Software Tweets

Blog Archive

My Top Articles

Translate My Blog

Search This Blog

Labels

Wednesday, 5 October 2016

Disk Full - Linux - Hacked or Full of Log Files?

Tuesday, 23 August 2016

Tuesday, 9 August 2016

Fun with Dates! Web Server, SQL Server, IIS and ASP Classic - Problems and Solutions

Good Practice

Debugging, Tests and Solutions

Friday, 8 July 2016

ISAPI URL Rewriting for ASP Classic on IIS 8

Saturday, 18 June 2016

Why just grabbing code from the web can lead to major problems down the line

Don't Be Fooled By "Turbo Boost" and Windows Performance / Cleaner Applications

Turbo Mode

Clean your PC manually

Tuesday, 17 May 2016

Stopping BOTS - A Multi Layered Approach

Friday, 29 April 2016

Chome and FireFox really getting on my tits....

Who is Strictly-Software?

My Stuff

Settings

Sites to Visit

Strictly-Software Tweets

Blog Archive

My Top Articles

Translate My Blog

Subscribe to Strictly-Software

Search This Blog

Labels