Saturday, 31 December 2011

Troubleshooting WAMP Server Installation on Windows 7 machines

Troubleshooting WAMP Server installation on Windows 7 computers

I like to code in both PHP and .NET or ASP so on my laptop or PC I need to be able to run
both PHP code and .NET code.

As both languages use different web servers to run their code this means on Windows PC's I have been installing WAMP Server to run and test any PHP code. However ever since moving to a new laptop (Windows 7 64 bit) I found that getting WAMP Server to work was a right pain in the arse.

I had managed it on my Work PC (also Windows 7 64 bit) without any problems at all so I don't know why on a fresh install I had issues. Anyhow this is how I troubleshooted WAMP Server installation on a Windows 7 laptop.

After installing wampserver turn it on and start all service then try and access it e.g http://localhost

You will probably be met with an IIS 7 home screen and not the WAMP Server home screen you might have expected.

This is because windows PC's come with their own IIS web server which sits on the same port 80 that WAMP Server does.

First off try turning off IIS by going into Administrator Tools, open IIS and disable it.

Or you could go into Administrator Tools, open Services and turn off World Wide Web Service.

Then restart all the WAMP Server services and try http://localhost or http://127.0.0.1 again

If it works then you will be met with the WAMP Server Home page but if like me on Windows 7 you might be met with a 404 page that just says

Not Found


HTTP Error 404. The requested resource is not found.

So WAMP is still not running even though the other web server running on that port is off. So something else is blocking or listening that port.

You can try to find out what this is by doing the following:

Running Command Prompt (under administrator rights)

Typing in: netstat -b -o

You need the -o so that you can see the Process ID column as long as you remembered to run under admin privileges and you need the -b to see which program is running or creating the connection consuming the listening port in this case port 80 on 127.0.0.1 (localhost).

You will get something back like this

C:\windows\system32>netstat -b -o

Active Connections

  Proto  Local Address          Foreign Address        State           PID
  TCP    127.0.0.1:80           my_pc_name:49719       TIME_WAIT       0
  TCP    127.0.0.1:80           my_pc_name:49721       TIME_WAIT       0
  TCP    127.0.0.1:80           my_pc_name:49723       TIME_WAIT       0
  TCP    127.0.0.1:80           my_pc_name:49725       TIME_WAIT       0
  TCP    127.0.0.1:80           my_pc_name:49727       TIME_WAIT       0
  TCP    127.0.0.1:80           my_pc_name:49729       TIME_WAIT       0
  TCP    127.0.0.1:80           my_pc_name:49731       TIME_WAIT       0
  TCP    127.0.0.1:80           my_pc_name:49733       TIME_WAIT       0
  TCP    127.0.0.1:1110         my_pc_name:49718       TIME_WAIT       0
  TCP    127.0.0.1:1110         my_pc_name:49720       TIME_WAIT       0
  TCP    127.0.0.1:1110         my_pc_name:49722       TIME_WAIT       0
  TCP    127.0.0.1:1110         my_pc_name:49724       TIME_WAIT       0
  TCP    127.0.0.1:1110         my_pc_name:49726       TIME_WAIT       0
  TCP    127.0.0.1:1110         my_pc_name:49728       TIME_WAIT       0
  TCP    127.0.0.1:1110         my_pc_name:49730       TIME_WAIT       0
  TCP    127.0.0.1:1110         my_pc_name:49732       TIME_WAIT       0
  TCP    127.0.0.1:5354         my_pc_name:49155       ESTABLISHED     1368
 [mDNSResponder.exe]
  TCP    127.0.0.1:27015        my_pc_name:49441       ESTABLISHED     1128
 [AppleMobileDeviceService.exe]
  TCP    127.0.0.1:49155        my_pc_name:5354        ESTABLISHED     1128
 [AppleMobileDeviceService.exe]
  TCP    127.0.0.1:49441        my_pc_name:27015       ESTABLISHED     4768
 [iTunesHelper.exe]
  TCP    192.168.1.7:49619      ww-in-f125:5222        ESTABLISHED     4884
 [googletalk.exe]

Not very helpful as all the Proccess ID's for port 80 on 127.0.0.1 are set to 0.

If it's not set to 0 you can find out what process is listening on it by going to your Task Manager, ticking the "Show processes from all users" box and then searching for the PID (process ID).

If the PID column is not shown then go to View > Select Columns and tick the PID (Process Identifier)
option.

So because we are no nearer to working out what is using this port we are stuck as the other web server is disabled and not listening on port 80 but it seems nothing else is. A virus scan with a good tool helps here just in case something is running that shouldn't be.

As I haven't been able to find out what is using the port I have to now resort to a hack to get WAMP Server running.

This hack is actually useful if you want to be able to run WAMP and IIS on the same machine at the same time.

If you have done a default install then go to the httpd.conf ini file which should be at:

c:\wamp\bin\apache\Apache2.2.21\conf\httpd.conf

Then find the Listen option which will be under some comments near the top starting with

# Listen: Allows you to bind Apache to specific IP addresses

and change it to a port you want to use instead of 80 and one you know you are not using. I chose port 8888 a common HTTP alternative port.

So add this line in under the comments.

Listen 8888

This should get your WAMP Server default page up after a WAMP restart by accessing:

http://localhost:8888

However you should also change another directive in the file which identifies the servers name.

It starts with these comments

# ServerName gives the name and port that the server uses to identify itself.

So replace

ServerName localhost

with

ServerName localhost:8888

or if you chose a different port then use that instead of 8888.

You will now find that you can access WAMP Server from your PC okay and if you still want IIS to run alongside then it's a good idea to change the PORT of WAMP Server anyway to prevent having to keep turning IIS off (or permanently disabling it).

A bit of a pain to get going but it works for my 64 bit Windows 7 PC and I know someone else who had the same problem with their new PC. On my older XP laptops and PC's I had no problems at all getting WAMP working and it's only since moving to Windows 7 I have had this issue.

If anyone know what causes this issue then please let me know but please don't reply SKYPE or some other application could be the problem as we ruled that out with the netstat -p -o command prompt scan earlier.

Anyway this hack gets WAMP Server running and it is good to know how to debug the problem.

Friday, 30 December 2011

New form of Web Site DOS attack leaves millions vulnerable

New security vulnerability can cause most popular websites to become susceptible to DOS attacks.

In a demo called "Efficient Denial of Service Attacks on Web Application Platforms" hosted by
Alexander “alech” Klink and Julian “zeri” Wälde they explained in detail how most web programming languages utilize hashes and manage collisions.

We are not talking about encryption here but the common sort of hashing that allows us to store data in key/value type array objects. It is a simple mathematical hash used to speed up storing and retrieving data posted to web pages used by most web programming languages like PHP and JavaScript.

A very very simple example that doesn't cover all the possibilities that this technique is used and can be exploited would be the following. I k now this is a client side JavaScript hash that would cause more problems for the browser and the users computer than the webserver but as you can run arbitrary JavaScript on any page very easily and XSS hacks are very common nowadays it is still worth showing.

Say I had a client side hash that held some simple values about the system the site was running. These key/values hold different bits of information that are relevant to the object I am using and if a user want to obtain a piece of information they supply the key for the value they want.

System = {
 ProductName: "MySystem",
 Version: 3,
 Build: 3,
 Company: "My Company PLC",
 ServerType: "dev",
 LastBuildDate : "2011-Dec-24 12:45:21"
}


You can get the appropriate value out of the hash table by accessing the appropriate key. In this example if I wanted to find out the current version of the System I would do something like this.

if( System.Version < 3){
 alert("Please upgrade to the latest version");
}

Not a very big hash table but in some cases hash tables can be absolutely huge.

The problem comes if a hacker can overwrite the hash table and set all the keys and values to the same value.

This causes the webserver to get itself into a state of confusion a it doesn't know what to return as all keys are now the same and in the report the authors say:

"An example given showed how submitting approximately two megabytes of values that all compute to the same hash causes the web server to do more than 40 billion string comparisons."

This is obviously a lot of calculation and for just looking up some data on a webpage is a massive overhead that can basically grind the page to a halt.

You can see by my example that by just by overwriting the hash table with all the same values it means my look up for the System.Version would involve checking all the keys in the System object (which would all be set to version) and the server (or in this case the browser as it's client side code) would get into a fiz because it wouldn't know which value to return as all the keys were now the same.

As the authors say only 2MB of values can cause a huge amount of string comparisons which would slow the machine down no end.

Apparently Perl have already done something about this vulnerability some time back but no-one else has yet followed their actions and hopefully it won't take a few big sites to go down before it is fixed across the board.

Without fixing the hashing functions in the languages themselves there are three possible techniques available to website operators to prevent problems occurring.
  • Reduce the length of parameters that can posted.
  • Reduce the number of parameters accepted by the web application framework.
  • Limit the amount of CPU time that any given thread is allowed to run.

Microsoft have released a fix less than 24 hours after disclosure. ASP.NET admins can download patch MS11-100 to protect their IIS web assets. You can find out about this patch from blogs.technet.com

Thursday, 29 December 2011

Running Whois commands from a Windows PC

How to run Whois commands from Windows Computers

Unlike LINUX this is a little more complicated and you will need to use a 3rd party tool like WhoisCL which you can download from here: http://www.softpedia.com/progDownload/WhoisCL-Download-44833.html

Be careful to click the "External Mirror link" and not the bright flashing "Download Now link" which is just there to trick you into downloading some kind of crapware.

Once downloaded, extract and then open the folder.

Copy the file WhoisCL.exe into your /windows/system32 folder OR make sure you edit your System Environment path so that it knows where the WhoisCl.exe file is located. This prevents you having to put the full path into the command prompt.

Also beware that many viruses often pretend to be commonly downloaded utilities like WhoisCL.exe so if you find a file with the name WhoisCL.exe in a place that you didn't install it then it "may" be a virus.

Make sure you always virus check every file you download and if you think you have a trojan use SDFix.exe in safe mode to hunt and destroy them. I have found that when virus checkers like McAfee or Kaspersky have run a full scan they still haven't found viruses that SDFix.exe has. Bewared though, because of the way the program works other Anti-Virus tools may falsely identify it as a virus.

To run Whois on a windows machine XP, Vista, Win7 (and above) then do the following.

Open up a command prompt and type:

C:\users\me\>whoiscl microsoft.com

You will get back a full Whois report for the domain you are looking at.

Add the flag -r to prevent the top line of the report from coming back e.g

C:\users\me\>whoiscl -r microsoft.com

C:\Users\me>whoiscl -r microsoft.com

WHOIS Server: whois.markmonitor.com

Registrant:
        Domain Administrator
        Microsoft Corporation
        One Microsoft Way
         Redmond WA 98052
        US
        domains@microsoft.com +1.4258828080 Fax: +1.4259367329

    Domain Name: microsoft.com

        Registrar Name: Markmonitor.com
        Registrar Whois: whois.markmonitor.com
        Registrar Homepage: http://www.markmonitor.com

    Administrative Contact:
        Domain Administrator
        Microsoft Corporation
        One Microsoft Way
         Redmond WA 98052
        US
        domains@microsoft.com +1.4258828080 Fax: +1.4259367329
    Technical Contact, Zone Contact:
        MSN Hostmaster
        Microsoft Corporation
        One Microsoft Way
         Redmond WA 98052
        US
        msnhst@microsoft.com +1.4258828080 Fax: +1.4259367329

    Created on..............: 1991-05-01.
    Expires on..............: 2021-05-02.
    Record last updated on..: 2011-08-14.

    Domain servers in listed order:

    ns3.msft.net
    ns2.msft.net
    ns1.msft.net
    ns5.msft.net
    ns4.msft.net




MarkMonitor is the Global Leader in Enterprise Brand Protection.

Domain Management
MarkMonitor Brand ProtectionÔäó
AntiFraud Solutions
Corporate Consulting Services

Visit MarkMonitor at www.markmonitor.com
Contact us at 1 800 745 9229
In Europe, at +44 (0) 20 7840 1300

Tuesday, 20 December 2011

HTML Hack to get Flash Banners to click through to external URL's

Flash Banner Click Hack

This might be old news to some of you but for those of us who spend most of our time developing rather than designing or cutting up sites this is a neat hack for getting flash banners to have click-through events without having to build them into the Flash Object itself.

I was given some banners the other day written in flash to put on a new site I am working on the-jag.com.

However these banners did not have any links built into them and as we all know you unless a flash object has a click event and a redirect to a URL, either hardcoded or passed in as a parameter then you cannot just add an <A> tag around it as you would with an animated gif or another image as it just won't work.

As CSS is not my speciality I asked a friend how to hack this so I wouldn't have to go back to the designer to get such a URL click through parameter added and these are the steps he gave me.

So if you ever want to add a click event or <A> around a flash banner then this is what you can do.

1. The original banner code. This was the standard OBJECT / EMBED mix, which although not standard compliant is still used a lot on the web.


<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,0,0" 
 width="300" height="300" id="banner-300-300" align="middle">
<param name="allowScriptAccess" value="sameDomain" />
<param name="allowFullScreen" value="false" />
<param name="wmode" value="transparent">
<param name="movie" value="banner-300-300.swf" />
<param name="quality" value="high" />
<param name="bgcolor" value="#fff200" /> 
 <embed src="banner-300-300.swf" 
  quality="high" 
  bgcolor="#fff200" 
  width="300" 
  height="300" 
  name="banner-300-300" 
  align="middle" 
  wmode="transparent" 
  allowScriptAccess="sameDomain"
  allowFullScreen="false" 
  type="application/x-shockwave-flash" 
  pluginspage="http://www.macromedia.com/go/getflashplayer" />
</object>



2. First off I added the wmode="transparent" parameter to the object and embed tag to stop the flash object being the top most element in the DOM. This allows you to then add other objects on top of it.

3. I then wrapped the object in a DIV tag that was position relative, had the same width and height as the banner I was showing which was 468x60 and had the same background colour.

This was because adding the wmode="transparent" removed the colour from the banner so it needed to be replaced by the DIV.


<div style="position:relative;background:#fff200;width:300px;height:300px;">


4. I then added some styling to the actual object tag to make it absolutely positioned within the DIV and positioned as the first item within it e.g top: 0px; left: 0px;

I also add a z-index on it of 5. You will see why later.


<object style="position:absolute;top:0px;left:0px;z-index:5;" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" 


5. I then added my <A> tag with the href of the URL I wanted the user to go to when they clicked the flash banner at the bottom of the <OBJECT> and before the closing </DIV>.


</object>
<a href="http://www.google.com" title="Google Search Engine" />Google Search Engine</a>
</div>


6. I then added some styling to the <A> element so that it was also positioned absolutely within the DIV and was the same size as the banner.

I also made sure the text within the anchor element was hidden off screen by adding text-indent:-9000px; to it. This allows search engines to still access the anchor text for Search Engine Optimisation but it doesn't appear on screen which would look silly.

I also made sure the <A> was the top element in the DOM (above the FLASH object) by increasing it's z-index to a figure larger than the value of 5 I had set earlier on the DIV (see point 4)


</object>
<a style="position:absolute;top:0px;left:0px;z-index:10;width:468px;height:60px;text-indent:-9000px;" href="http://www.google.com" title="Google Search Engine" />Google Search Engine</a>
</div>



7. Putting this all together looks like this.


<div style="position:relative;background:#fff200;width:468px;height:60px;">
 <object style="position:absolute;top:0px;left:0px;z-index:5;" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=9,0,0,0" width="468" height="60" id="jag-banner-468-60" align="middle">
 <param name="allowScriptAccess" value="sameDomain" />
 <param name="allowFullScreen" value="false" />
 <param name="wmode" value="transparent">
 <param name="movie" value="banner-468-60.swf" />
 <param name="quality" value="high" />
 <param name="bgcolor" value="#fff200" /> 
  <embed src="jag-banner-468-60.swf" quality="high" bgcolor="#fff200" width="468" height="60" name="banner-468-60" align="middle" wmode="transparent" allowScriptAccess="sameDomain" allowFullScreen="false" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" />
 </object>
 <a style="position:absolute;top:0px;left:0px;z-index:10;width:468px;height:60px;text-indent:-9000px;" href="http://www.google.com" title="Google Search Engine" />Google Search Engine</a>
</div>


I tested this code in the latest versions of Google, Firefox, IE as well as going back to IE 7 in quirks mode and the banner was shown as it should and any click on it took the user to the desired anchor location. No text within the anchor was seen on the screen.

This is just a hack that non designers, like myself, might like to know for future reference as it saves having to ask the developer of the flash file to re-develop it to allow for click-throughs.

Wednesday, 7 December 2011

Speeding up Google Chrome with DNS Pre-Fetching

Why Google Chrome is such a fast browser


I have written a lot about browsers, their intricacies and problems with upgrades and performance including Performance Tweaks for Firefox that you can access from typing "about:config" in the address bar that can increase your browser performance.

Whilst I use Firefox mainly for development at work I have always used Chrome for browsing due to its speed and I find myself using it more and more for development.

Their built in developer tools are just as good as the Firebug add-on which has gone through phases of being slow, buggy and sometimes just plain unusable.

Not only does Google Chrome allow you to inspect elements, modify the DOM on the fly, check resource load times and other useful developer tools it has some other features that many people probably don't even realise.

If you have never done it before just type in "chrome://version/" to your address bar to see details of your current browser e.g

Google Chrome 15.0.874.121 (Official Build 109964) m
OS Windows
WebKit 535.2 (@100034)
JavaScript V8 3.5.10.24
Flash 11,1,102,55
User Agent Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2
Command Line "C:\Users\me\AppData\Local\Google\Chrome\Application\chrome.exe" --flag-switches-begin --enable-print-preview --flag-switches-end
Executable Path C:\Users\me\AppData\Local\Google\Chrome\Application\chrome.exe
Profile Path C:\Users\me\AppData\Local\Google\Chrome\User Data\Default

A nice overview of your browser.

For a list of domains that your browser pre-fetches behind the scenes to speed up load times type the following into your address bar "about:dns" You will see a list of popular domains that you visit along with stats about each DNS pre-fetch.

At the top of the report you will see the 10 pages that Google pre-fetches on startup and it does this to help speed up your browsing experience as it calculates which sites you visit the most so that it can keep these handy in case you want to visit them. By pre-fetching them when you start the browser up it make it look like page load times for these domains are a lot faster than they otherwise would be.

For example my own report shows:

Future startups will prefetch DNS records for 10 hostnames
Host nameHow long ago
(HH:MM:SS)
Motivation
http://blog.strictly-software.com/01:09:02n/a
http://connect.facebook.net/01:09:01n/a
http://pagead2.googlesyndication.com/01:09:01n/a
http://platform.twitter.com/01:09:01n/a
http://s7.addthis.com/01:09:01n/a
http://shots.snap.com/01:09:01n/a
http://twitter.com/01:09:01n/a
http://www.blogger.com/01:09:01n/a
http://www.darkpolitricks.com/01:09:53n/a
http://www.strictly-software.com/01:09:01n/a


All sites I regularly visit and at the bottom of the report I get the following message;

Preresolution DNS records performed for 328 hostnames
Preresolving DNS records revealed non-existence for 5 hostnames.


Another more complex report can be seen with this command "about:histograms/" which will show a number of basic graphs about various DNS checks, lookup speeds and other various stats.


This DNS Pre-Fetching is meant to speed up your browsing experience however sometimes it can cause issues which are noticeable if you ever experience slow page load times along with the words "Resolving Host" in the status bar.

If you have this issue try disabling the pre-fetch feature and compare and contrast page load times with the option on and off with a "ipconfig /flushdns" in between each attempt to make the test fair as well as clearing out any browser cache.

You can do this with either a web developer toolbar, choosing "Tools > Clear Browsing Data" or by selecting "Toolbox > Options > Under the bonnet > Clear browsing data".

You will also find in later versions of Google Chrome they have combined this pre-fetch option with the pre-rendering option under "Predict Network Actions" which is found under "Toolbox > Options > Under The Bonnet > Predict network actions to improve page load performance", 

This option will not only turn on the DNS pre-fetching option it will also aim to speed up searches by pre-loading the first 3 results of any Google search. They do this because statistics apparently show that when a user runs a search they will click on the first 3 links.

Obviously if you don't do that then those page loads have been a waste of time and resources as well as ensuring the HTTP requests are logged in any log file on your server.

However the aim is to speed your browsing up by pre-fetching pages you visit a lot which is a good idea in theory.

You can read more about these two features, pre-fetching and pre-rendering here:

http://blog.chromium.org/2008/09/dns-prefetching-or-pre-resolving.html
http://www.chromium.org/developers/design-documents/dns-prefetching



Monday, 28 November 2011

Automattic crack down on Plugin Developers for Wordpress

Wordpress crack down on plugin developers wanting to Earn Money from their hard work

I have had to remove the PluginSponsors.com Wordpress plugin from all my Wordpress plugins due to Wordpress threatening the maker of the PluginSponsors.com plugin that they will remove any plugins that utilise his harmless advert with removal from the wordpress repository.

It's not as if Wordpress Plugin developers make much money from their hard work in the first place and people who use them seem to expect 24 hour support for free as well.

Rarely I might get a small donation from one of my plugin users but it is hardly enough to cover the expense of running a cloud based server so in my opinion Wordpress is bang out of order.

Not only have I had numerous blogs on wordpress.com removed without explanation (one I have had returned without any reasoning behind the ban in the first place) but when my first wordpress.com blog was banned the other year I found out just how unreasonable the people at Automattic can be from my email discussions with them.

Not only would refuse to offer proper explanations for their reasoning behind the ban even when I gave them logical counter arguments to anything they mentioned but their intransigence reminded me of a stubborn child demanding more sweeties and refusing to back down until they got them.

Then when logic and reason was too much for them they just ignored my emails leaving me with a banned site that was up until then receiving high amounts of traffic.

Today if I was recommending a free hosted blogging tool to anyone I would tell them to go and check out blogspot with Google.

I know this blog is a blogspot blog and doesn't exactly look much but it was my first foray into blogging and I created it a good few years ago.

Nowadays their blogging software is much better and unlike the free wordpress.com blogs they actually allow you to upload images into the header, show adverts and customise the CSS of your blog for free.

After trying to set up 3 wordpress.com blogs that all had the word "horseracing" in the domain which were all instantly banned (without any kind of reason) I set this one up on blogspot: UK Horse Racing Star without any problems at all.

As you can see it's a lot better looking than this old blog plus it took me less than 20 minutes to style it, upload a header image and set the adverts up.

From now on I think I will be sticking to blogspot as Automattic have shown that they don't care about the plugin developers who spend hours even weeks writing all the thousands of plugins that make their system work.

If they did care they would offer a marketplace like Joomla for developers to sell their hard work instead of cracking down on those plugin authors who want some kind of recompense for their development.

You can read the author of the PluginSponsors.com thoughts on the matter here Automattic Bullies.

Speeding up batch SQL Processes that use temporary table variables

Problems with TABLE variables and how to improve performance for SQL Batch Jobs


We recently moved our production system to SQL 2008. Everything seemed okay until we realised after a fortnight that a particular MS Agent job had not been completing correctly.

One of the steps within the job had been failing due to a missing SQL Login. This job transferred large amounts of data from daily tables to historical ones and a large backlog had now been created numbering in the tens of millions.

Due to the use of our batch delete process that removed data that had been transferred in small chunks (500 rows) to reduce locking the data had got to a stage where new records were being added almost as fast as we were removing them.

When I looked at the query that was doing the insert and then delete I saw that it was using a temporary table variable to hold the temporary data whilst it was being transformed.As this was now 20 million plus records it was a problem on it's own due to it's limitations.

This is a common SQL Performance bottleneck and one I have come across a few times now.

SQL's TABLE variables are very good for small datasets (a few hundred at most) and are very useful for array like usage within stored procedures. However when you move to large amounts of records they just become a performance nightmare.

Unlike proper temporary tables or fixed permanent tables you cannot add indexes to TABLE variables and with large record sizes this is usually a must. The constant table scans the process must have been having to do to find rows must have been a major cause of the slow performance.

I rewrote the stored proc to make use of a temporary table, added a clustered index on the columns I was using for my joins to the real table and I used a batch DELETE process using the TOP (@X) statement to remove old records.

This has sped the process up immensely but the automatic creation of indexes and creation and dropping of tables requires permissions higher than those that the website login I use for my site has.

Therefore to ensure everything ran smoothly I had to use the WITH EXECUTE AS 'login' statement to allow the user to impersonate a login with higher privileges.

A cut down example of the proc is below and it shows the following:

  1. Impersonating logins with higher permissions to allow for DDL statements e.g CREATE and DROP.
  2. Checking TEMP DB for existing temporary tables and indexes and dropping them if they exist.
  3. Deleting large numbers of records in batches to prevent blocking.
  4. Ensuring a SARGABLE where clause is used instead of DATEDIFF to get yesterdays data



CREATE PROCEDURE [dbo].[usp_sql_job_data_transfer] 
 @Success BIT = 0 OUTPUT,
 @RowsCopied INT = 0 OUTPUT,
 @RowsDeleted INT = 0 OUTPUT

WITH EXECUTE AS 'admin_user'
AS
  BEGIN

  SET NOCOUNT ON
  SET DATEFORMAT YMD

  DECLARE @Stamp datetime, 
   @InsertDate datetime,
   @TmpRows int,
   @Rows int,  
   @RowsInserted int,
   @error int
   

  -- check for existance of old temp table, a local temp table wont persist but we may need to use a global one (Example usage)
  IF object_id('tempdb..#JobHits') IS NOT NULL
    BEGIN
    -- drop it
      
    DROP TABLE #JobHits
    END 
 
  -- create temp table   
  CREATE TABLE #JobHits(
   JobFK int, 
   Stamp Datetime, 
   ViewHits int, 
   SearchHits int
  )

  -- job runs after midnight and we want a SARGABLE WHERE clause
  SELECT @Stamp = CONVERT(datetime,CONVERT(varchar,getdate(),23))

   
  SELECT @TmpRows = 0, @RowsInserted = 0, @RowsDeleted = 0  

  -- insert into my temp table the search hits by date (SQL 2005)
  -- if this proves to be still slow or causing blocks then try inserting in batches of 1000 OR 20 min chunks
  INSERT INTO #JobHits
  (JobFK, Stamp, ViewHits, SearchHits)
  SELECT JobFK, CONVERT(datetime,CONVERT(varchar,Stamp,23)),0, count(jobFk) 
  FROM JOBS_DATA with (nolock)
  WHERE HitType = 'S' 
   AND Stamp < @Stamp
  GROUP BY JobFK, CONVERT(datetime,CONVERT(varchar,Stamp,23))

  SELECT @TmpRows = @@ROWCOUNT, @Error = @@ERROR
  
  IF @Error <> 0
    GOTO HANDLE_ERROR

  -- insert into my temp table view hits
  INSERT INTO #JobHits
  (JobFK, Stamp, ViewHits, SearchHits)
  SELECT JobFK, CONVERT(datetime,CONVERT(varchar,Stamp,23)),count(jobFk), 0
  FROM JOBS_DATA with (nolock)
  WHERE HitType = 'V' AND 
   Stamp < @Stamp -- SARGABLE WHERE CLAUSE - note I am not using DATEDIFF
  GROUP BY JobFK, CONVERT(datetime,CONVERT(varchar,Stamp,23))

  SELECT @TmpRows = @TmpRows + @@ROWCOUNT, @Error = @@ERROR


  -- if an error occurred jump to the error handler
  IF @Error <> 0 
    GOTO HANDLE_ERROR
  --If no error but no rows then its just a bad day for the site with no hits
  ELSE IF @TmpRows = 0  
    GOTO EXIT_PROC
  ELSE
    BEGIN
     
    -- add an index to aid lookups and searching 

    -- ensure no record exists already in Temp DB
    IF object_id('tempdb..clidx_#JobHits') IS NOT NULL 
      BEGIN
      
     DROP INDEX clidx_#JobHits ON #JobHits

      END

       
    -- Add a clustered index to help searching - cover the main join column JobFk and group by column Stamp 
    CREATE CLUSTERED INDEX [clidx_#JobHits] ON #JobHits
    (
     [JobFK] ASC,
     [Stamp] ASC    
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 100) ON [PRIMARY]
    
    
    END
  
  
  -- join my temp table to the main table to get info about the company the job belonged to
  INSERT INTO JOBS_DATA_HISTORY
  (JobFk,Stamp,ViewHits, SearchHits, ClientFK)
  SELECT a.JobFK, a.Stamp, Sum(a.ViewHits), Sum(a.SearchHits), j.ClientFK
  FROM  #JobHits as a
  JOIN  JOBS as j with (nolock)
   ON  a.JobFk = j.JobPK
  GROUP BY a.JobFK, a.Stamp, j.ClientFk

  SELECT @RowsInserted = @@rowcount, @error = @@ERROR
  

  --if we are here then there must be > 0 rows otherwise we would have exited earlier
  IF @RowsInserted=0 or @Error<>0
    GOTO HANDLE_ERROR
 

  -- Now we have copied our data we need to delete it all from the daily table
  -- as this table is being used by the site still we delete in batches to prevent blocking locks
  SELECT @Rows = 1
  
  -- loop until no more rows are left
  WHILE @Rows > 0
    BEGIN 
    -- delete data in table using the TOP command to ensure we only delete on our indexed column in batches
    DELETE TOP(1000)
    FROM JOBS_DATA
    WHERE Stamp < @Stamp -- indexed column in JOBS_DATA
      
    SELECT @Rows = @@ROWCOUNT, @RowsDeleted = @RowsDeleted + @Rows, @Error = @@ERROR

    -- should i wait for a second in between batches? cannot find a definitive answer
    -- if problems occur (which they haven't so far) then try this
    --WAITFOR DELAY '00:00:01' 
    
    IF @Error <> 0 
      GOTO HANDLE_ERROR
    END

  --if no rows were deleted then something went pete tong
  IF @RowsDeleted=0
    GOTO HANDLE_ERROR
  END

-- clean up
CLEAN_UP:
    
   -- add an index to aid lookups and searching (not sure if when table is dropped the index record remains or not so being safe - check this!)
  IF object_id('tempdb..clidx_#JobHits') IS NOT NULL 
    BEGIN
  -- drop our clustered index
  DROP INDEX clidx_#JobHits ON #JobHits

    END
    
  IF object_id('tempdb..#JobHits') IS NOT NULL
    BEGIN
  -- drop our temp table  
  DROP TABLE #JobHits
    END 
  
  
  -- jump over error handler and exit 
  GOTO EXIT_PROC
  

--handle error
HANDLE_ERROR:

  SELECT @Success = 0

  IF @FromProc = 0
    SELECT @Success as success, COALESCE(@RowsCopied,0) as rowscopied,COALESCE(@RowsDeleted,0) as rowsDeleted

  -- exit
  RETURN 0



--handle a successful exit
EXIT_PROC:
  SELECT @Success = 1, @RowsCopied = @RowsInserted

  -- exit
  RETURN 1

Sunday, 13 November 2011

Performance Tuning Tools for MySQL on LINUX

How to Performance Tune MySQL for Wordpress

Since I have been working a lot with Wordpress, PHP and Apache I have had to get used to the limitations of MySQL and the applications that are available to connect to MySQL databases such as Navicat or PHPMyAdmin.

As well as all the missing DML such as CTE's I really miss the very useful Data Management Views (DMV's) that has as they make optimising a database very easy and I have built up a large collection of SQL Server Performance Reports and tools to help me debug database performance issues.

Since working with Wordpress I have seen a lot of Plugins that use very poor SQL techniques and it seems most plugin authors that create their own tables in the Wordpress database don't even think about indexes that could increase performance of some very badly written SQL.

Having to sift through the slow query log and then run an EXPLAIN on each query is a time consuming job whereas setting up a scheduled job to monitor missing indexes and then list all suggestions is very easy to do in MSSQL with their Data Management Views (DMV's) that hold information about missing indexes, high cost queries, cached query plan re-use and much more.


My SQL Optimisation for Wordpress and LINUX

There are a number of tools available for performance tuning MySQL and I have listed a few below.


MySQLTuner.pl

MySQLTuner is a Perl script that analyzes your MySQL performance and, based on the statistics it gathers, returns recommendations based on the ShowVariables SQL that you can adjust to increase performance.

One of the good things about having root access to your own LINUX server (VPS or dedicated) is the ability to SSH in and then load and install programs remotely with a few lines of code from the command prompt.

To load and run MySQLTuner.pl on your server do the following:

Change the directory to your program folder e.g:

cd /usr/bin

Load the tool in remotely with a WGET command e.g:

wget http://mysqltuner.pl/mysqltuner.pl

Change the permissions to make it executable e.g:


chmod +x mysqltuner.pl

Run the tool e.g

myhost:/usr/bin# /usr/bin/mysqltuner.pl

>>  MySQLTuner 1.2.0 - Major Hayden 
>>  Bug reports, feature requests, and downloads at http://mysqltuner.com/
>>  Run with '--help' for additional options and output filtering
Please enter your MySQL administrative login: root
Please enter your MySQL administrative password:

-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.0.51a-24+lenny5-log
[OK] Operating on 64-bit architecture

-------- Storage Engine Statistics -------------------------------------------
[--] Status: +Archive -BDB +Federated -InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 419M (Tables: 98)
[!!] Total fragmented tables: 9

-------- Security Recommendations  -------------------------------------------
[OK] All database users have passwords assigned

-------- Performance Metrics -------------------------------------------------
[--] Up for: 3d 3h 0m 33s (5M q [22.010 qps], 58K conn, TX: 102B, RX: 1B)
[--] Reads / Writes: 87% / 13%
[--] Total buffers: 314.0M global + 2.6M per thread (100 max threads)
[OK] Maximum possible memory usage: 576.5M (55% of installed RAM)
[OK] Slow queries: 1% (70K/5M)
[OK] Highest usage of available connections: 16% (16/100)
[OK] Key buffer size / total MyISAM indexes: 64.0M/196.8M
[OK] Key buffer hit rate: 100.0% (18B cached / 5M reads)
[OK] Query cache efficiency: 83.1% (4M cached / 5M selects)
[!!] Query cache prunes per day: 61605
[OK] Sorts requiring temporary tables: 0% (207 temp sorts / 356K sorts)
[!!] Joins performed without indexes: 35147
[!!] Temporary tables created on disk: 44% (309K on disk / 690K total)
[OK] Thread cache hit rate: 99% (329 created / 58K connections)
[!!] Table cache hit rate: 13% (191 open / 1K opened)
[OK] Open file limit used: 22% (231/1K)
[OK] Table locks acquired immediately: 99% (1M immediate / 1M locks)

-------- Recommendations -----------------------------------------------------
General recommendations:
    Run OPTIMIZE TABLE to defragment tables for better performance
    Adjust your join queries to always utilize indexes
    When making adjustments, make tmp_table_size/max_heap_table_size equal
    Reduce your SELECT DISTINCT queries without LIMIT clauses
    Increase table_cache gradually to avoid file descriptor limits
Variables to adjust:
    query_cache_size (> 40M)
    join_buffer_size (> 128.0K, or always use indexes with joins)
    tmp_table_size (> 200M)
    max_heap_table_size (> 200M)
    table_cache (> 200)



You should carefully read the output, especially the recommendations at the end.

It shows exactly which variables you could adjust in the [mysqld] section of your my.cnf (on Debian and Ubuntu the full path is /etc/mysql/my.cnf). but be careful as this is only advice and you should find out as much as you can before changing core configuration settings.

Whenever you change your my.cnf file, make sure that you restart MySQL with this command (or use your GUI)

/etc/init.d/mysql restart


You can then run MySQLTuner again to see if it has further recommendations to improve the MySQL performance.

Another tool of a similar nature is the MySQLReport tool which can be found at http://hackmysql.com.

Information can be found here about how to read and analyse the report that is produces from this link http://hackmysql.com/mysqlreportguide.

You can load it up remotely and build it on your server in a similar way making use of an HTTP tool like CURL or WGET etc:


wget hackmysql.com/scripts/mysqlreport

--2011-11-13 02:58:47--  http://hackmysql.com/scripts/mysqlreport
Resolving hackmysql.com... 64.13.232.157
Connecting to hackmysql.com|64.13.232.157|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 38873 (38K) [application/x-perl]
Saving to: `mysqlreport'

100%[======================================>] 38,873      --.-K/s   in 0.1s

2011-11-13 02:58:47 (254 KB/s) - `mysqlreport' saved [38873/38873] 
 


Once loaded give the newly installed file execute permission with the following command

chmod +x mysqlreport1.pl

You then call it by passing through the details of the system you want to analyse e.g:

mysqlreport --user root --host localhost --password mypsw100


MySQL 5.0.51a-24+lenny5  uptime 3 3:21:16       Sun Nov 13 03:04:11 2011

__ Key _________________________________________________________________
Buffer used    52.33M of  64.00M  %Used:  81.76
  Current      61.15M            %Usage:  95.55
Write hit      99.96%
Read hit       99.97%

__ Questions ___________________________________________________________
Total           5.96M    22.0/s
  QC Hits       4.60M    17.0/s  %Total:  77.25
  DMS           1.07M     3.9/s           17.97
  Com_        226.52k     0.8/s            3.80
  COM_QUIT     58.22k     0.2/s            0.98
  +Unknown        408     0.0/s            0.01
Slow (2)       70.43k     0.3/s            1.18  %DMS:   6.58  Log:  ON
DMS             1.07M     3.9/s           17.97
  SELECT      935.60k     3.4/s           15.70         87.35
  UPDATE      127.41k     0.5/s            2.14         11.90
  INSERT        7.63k     0.0/s            0.13          0.71
  DELETE          450     0.0/s            0.01          0.04
  REPLACE           0       0/s            0.00          0.00
Com_          226.52k     0.8/s            3.80
  set_option  169.55k     0.6/s            2.84
  change_db    56.71k     0.2/s            0.95
  optimize         91     0.0/s            0.00

__ SELECT and Sort _____________________________________________________
Scan           68.94k     0.3/s %SELECT:   7.37
Range          42.09k     0.2/s            4.50
Full join      35.21k     0.1/s            3.76
Range check         0       0/s            0.00
Full rng join       0       0/s            0.00
Sort scan     300.33k     1.1/s
Sort range     56.97k     0.2/s
Sort mrg pass     207     0.0/s

__ Query Cache _________________________________________________________
Memory usage   36.03M of  40.00M  %Used:  90.07
Block Fragmnt  10.04%
Hits            4.60M    17.0/s
Inserts       842.73k     3.1/s
Insrt:Prune    4.33:1     2.4/s
Hit:Insert     5.46:1

__ Table Locks _________________________________________________________
Waited          1.44k     0.0/s  %Total:   0.08
Immediate       1.77M     6.5/s

__ Tables ______________________________________________________________
Open              200 of  200    %Cache: 100.00
Opened          1.54k     0.0/s

__ Connections _________________________________________________________
Max used           16 of  100      %Max:  16.00
Total          58.52k     0.2/s

__ Created Temp ________________________________________________________
Disk table    310.39k     1.1/s
Table         381.99k     1.4/s    Size: 200.0M
File              431     0.0/s

__ Threads _____________________________________________________________
Running             1 of    1
Cached              7 of    8      %Hit:  99.44
Created           329     0.0/s
Slow                3     0.0/s

__ Aborted _____________________________________________________________
Clients           588     0.0/s
Connects           17     0.0/s

__ Bytes _______________________________________________________________
Sent          102.63G  378.3k/s
Received        1.95G    7.2k/s

__ InnoDB Buffer Pool __________________________________________________
Usage               0 of       0  %Used:   0.00
Read hit        0.00%
Pages
  Free              0            %Total:   0.00
  Data              0                      0.00 %Drty:   0.00
  Misc              0                      0.00
  Latched           0                      0.00
Reads               0       0/s
  From file         0       0/s            0.00
  Ahead Rnd         0       0/s
  Ahead Sql         0       0/s
Writes              0       0/s
Flushes             0       0/s
Wait Free           0       0/s

__ InnoDB Lock _________________________________________________________
Waits               0       0/s
Current             0
Time acquiring
  Total             0 ms
  Average           0 ms
  Max               0 ms

__ InnoDB Data, Pages, Rows ____________________________________________
Data
  Reads             0       0/s
  Writes            0       0/s
  fsync             0       0/s
  Pending
    Reads           0
    Writes          0
    fsync           0

Pages
  Created           0       0/s
  Read              0       0/s
  Written           0       0/s

Rows
  Deleted           0       0/s
  Inserted          0       0/s
  Read              0       0/s
  Updated           0       0/s


In a similar way you will need to know how to analyse each section to make the neccessary changes and you can find out what each calculation result means here: http://hackmysql.com/mysqlreportdoc

Another good tool to use to find out what is going on in your database is MyTop. This is an application like Top that shows the current processes running on a server but for a MySQL database rather than the whole server. It does this by analysing the same data that SHOW PROCESSLIST would output.

You can obtain the code from http://jeremy.zawodny.com/mysql/mytop/

Once loaded you call it from your command prompt like the other commands passing in the host, database and password as well as a refresh rate if you require it. I think it defaults to 5 seconds but I like to use 1 second which you can change by supplying a parameter for --s parameter e.g:

mytop --user root --password mypsw --db myDB --s 1

The results look like this with a header that shows how many queries have run per second as well as slow queries from the slow quert log.

MySQL on localhost (5.0.51a-24+lenny5-log)                                 up 3+03:48:10 [03:31:05]
 Queries: 5.7M   qps:   22 Slow:     0.0         Se/In/Up/De(%):    77/00/00/00
             qps now:    2 Slow qps: 0.0  Threads:    3 (   3/   5) 00/00/00/00
 Key Efficiency: 100.0%  Bps in/out:   0.0/  0.6   Now in/out:  41.2/10.0k

      Id      User         Host/IP         DB      Time    Cmd Query or State
      --      ----         -------         --      ----    --- ----------
   58766      root       localhost strictly         0  Query show full processlist
   58772 darkpolit       localhost strictly         1  Query select CONCAT('http://www.strictly         
   58771 strictly        localhost strictly         4  Query select CONCAT('http://www.strictly         

Another method for those who don't have direct access into their LINUX server but only a control panel like CPANEL and use Wordpress for their website is my own Wordpress Plugin - the Strictly System Checker Plugin.

The Strictly System Checker is a Wordpress plugin that is designed to allow webmasters to monitor their site at regular intervals throughout the day and to be notified if the site goes down or experiences database problems or high server loads.

This plugin was not designed to be a replacement for professional server monitoring tools however it is a nice easy to use system that can aid webmasters in monitoring their Wordpress site as well as notifying the right person whenever the site is down or running into performance problems.

How it works

  • A CRON / WebCron job initiates an HTTP request to check whether the site can be accessed.
  • The system will check for the Error establishing a database connection error message as well as searching for an optional piece of text which can help indicate whether the page has loaded correctly.
  • If the error is found or the text cannot be found then a connection to the database is attempted.
  • If successful a CHECK and REPAIR is carried out on any tables that maybe corrupted.
  • An option exists to also check for fragmented tables and an OPTIMIZE command is carried out to fix any found.
  • An SQL report is carried out to report on some key performance indicators such as the number of connections, queries, reads, writes and more.
  • A report is carried out on the webserver to look at the current server load average and if it's above a specified threshold a report to the site administrator can be triggered.
  • A similar check is carried out on the database to ensure that there are no slow running queries or the connection limit hasn't been reached.
  • If problems are found an email is then sent to the site administrator with details of the report.
An example of a report that can be emailed to system administrators or viewed from the admin part of the website is below.

System Report: 2011-11-13 04:07:07

Initiating System Report...
Initiating an HTTP request to http://www.strictly-software.com
The HTTP request to http://www.strictly-software.com took 0 second(s) to respond and returned a status code of 200
The specified search text [read more] was found within the HTTP response
The server load is currently 0.48
The server load is okay
MySQL has been running for: 26 days 4 hours 18 mins 48 secs
Total Connections: 467253 - Aborted: 0 - Connections Per Hour 743
Total Queries: 21592142 - Queries / Per Hour 34365
Joins without indexes: 0 - Joins without indexes Per Hour 0
Total Reads: 4819507 (88%) - Total Writes 630029 (12%)
The system is currently configured to accept a maximum of 100 database connections
At the time of reporting the database was running 3 query
The current database load is 3%
The database load is okay
Initiating a check for fragmented tables and indexes
Optimized table: wp_options
Optimized table: wp_postmeta
Completed check for fragmented tables and indexes
The system report has completed all its tests successfully.


Report Completed At 2011-11-13 04:07:07

Strictly Software Plugins for Wordpress


So whilst I feel the tools available for MySQL performance monitoring are lacking when compared with those available with SQL 2008 there are some that will enable you to get the job done. Hopefully this article has been helpful and if you have any tools of your own please add them to t the comment section.

Thursday, 10 November 2011

Error copying tables importing or exporting the Geography or Geometry data type in SQL 2008

Error importing and exporting the Geometry and Geography data types in SQL 2008

Today I had to make some work live that involved copying a database containing UK Postcodes and their related geo-location data to another SQL 2008 server.

The table contained a list of all UK Postcodes as well as their longitude and latitude and a GeoLocation column that was based on the new SQL 2008 DataType Geography.

However when I tried to use the Import/Export wizard to copy the data I got to the Review Data Type Mapping page and was met with the following error message.


Found 1 unknown column type conversion(s) You are only allowed to save the package.


Import Geography Error Screenshot

Viewing the full error details revealed that the problem was down to SQL not understanding the Geography data type in the table I wanted to export.


The full error message details are below:



[Source Information]
Source Location : (local)
Source Provider : SQLNCLI10
Table: [dbo].[Country]
Column: CountryGeography
Column Type: geography
SSIS Type: (Type unknown ...)
Mapping file (to SSIS type): C:\Program Files\Microsoft SQL Server\100\DTS\MappingFiles\MSSQLToSSIS10.XML

[Destination Information]
Destination Location : (local)
Destination Provider : SQLNCLI10
Table: [dbo].[Country]
Column: CountryGeography
Column Type: geography
SSIS Type: (Type unknown ...)
Mapping file (to SSIS type): C:\Program Files\Microsoft SQL Server\100\DTS\MappingFiles\MSSQLToSSIS10.XML

[Conversion Steps]
Conversion unknown ...
SSIS conversion file: C:\Program Files\Microsoft SQL Server\100\DTS\binn\DtwTypeConversion.xml


First of all I checked that the database compatibility mode was set to 100 (SQL 2008) and not 90 (SQL 2005) and once I had confirmed both databases were the correct format I checked the mapping conversion XML file on the server I was doing the import from.

This XML mapping file is located at the following path C:\Program Files\Microsoft SQL Server\100\DTS\MappingFiles\MSSQLToSSIS10.XML.

On opening the file I could see that there was no mention of the geography OR geometry data types which explained why the DTS package wizard could not carry out the operation.

To fix this I copied one of the other similar data types (varbinary) and re-inserted it into the file twice before changing the names to Geography and Geometry.

You can just copy and paste the following XML into the file.


  <!-- geography -->
  <dtm:DataTypeMapping >
    <dtm:SourceDataType>
      <dtm:DataTypeName>geography</dtm:DataTypeName>
    </dtm:SourceDataType>
    <dtm:DestinationDataType>
      <dtm:SimpleType>
        <dtm:DataTypeName>DT_IMAGE</dtm:DataTypeName>
      </dtm:SimpleType>
    </dtm:DestinationDataType>
  </dtm:DataTypeMapping>




  <!-- geometry -->
  <dtm:DataTypeMapping >
    <dtm:SourceDataType>
      <dtm:DataTypeName>geometry</dtm:DataTypeName>
    </dtm:SourceDataType>
    <dtm:DestinationDataType>
      <dtm:SimpleType>
        <dtm:DataTypeName>DT_IMAGE</dtm:DataTypeName>
      </dtm:SimpleType>
    </dtm:DestinationDataType>
  </dtm:DataTypeMapping>


Save the MSSQLToSSIS10.XML file and you should now be able to import or export a table that contains the datatypes geography and geometry.

If for some reason after editing the file it still doesn't work, try the following:

  • restarting the SQL Server service on the server you are running the export/import wizard from.
  • restarting your own SQL Server Client Tools.
  • If you are running a 64bit version of Windows 7 like I am then you might need to also edit the same file in the 32 bit Program folder e.g C:\Program Files (x86)\Microsoft SQL Server\100\DTS\MappingFiles\MSSQLToSSIS10.XML.

Thursday, 27 October 2011

Proving the benefits of SARGABLE Clauses and converting DATEDIFF to BETWEEN in SQL

Increasing performance with date range searches in SQL

I often find myself wanting to retrieve data from a table for the current day only.

There are many ways to do this but one of the easiest and the one I often find people using is to use
a non SARGABLE WHERE clause that makes use of the DATEDIFF function to only return records for the current date e.g


SELECT *
FROM BLAH
WHERE DATEDIFF(DAY,Stamp,GETDATE())=0

The problem with this is related to performance and whilst you might see negligible effects on small datasets
when you are dealing with datasets containing millions of rows you want the best performance possible.

If you don't know what a SARABLE clause is here is the definition:

In relational databases, a condition (or predicate) in a query is said to be sargable if the DBMS engine can take advantage of an index to speed up the execution of the query (using index seeks, not covering indexes). The term is derived from a contraction of Search ARGument Able.

Therefore because we have wrapped the Stamp (Date) column in the DATEDIFF function the clause becomes NON SARGABLE and an INDEX SCAN will be carried out instead of the performance benefits
that an INDEX SEEK would provide.

Obviously if you don't have an index on the data column in question it won't make the slightest bit of difference but lets test this theory.

First we want some code that will give us the start and end date of the current day.

I am using code for SQL 2005 and below here as in SQL 2008 there are specific DATE datatypes
that don't contain time parts but running a simple:

SELECT GETDATE()

-- returns
2011-10-27 15:01:39.000

As we want to replace the DATEDIFF with a BETWEEN @STARTDATE AND @ENDDATE we need to create the minimum date for the day and the maxiumum date for the day.

We can do this using some basic CAST conversions that first create the date for today at midnight e.g 2011-10-27 00:00:00:000 and then once we have that we can add the appropriate number of milliseconds to the date to give us the maximum end date e.g 2011-10-27 23:59:59.997.

To calculate the number of milliseconds we need to add is simple we multiple 60 seconds * 60 minutes * 24 hours * 1000 milliseconds and then deduct 3 (as 997 is the biggest value allowed in TSQL) e.g:


SELECT (60 * 60 * 24 * 1000)-3

The code to get the start and end date for the current day in TSQL is below.


DECLARE @StartDate DATETIME,
 @EndDate DATETIME
  
-- quickly convert our current datetime into the current date at midnight e.g 2011-10-27 00:00:00.000
-- by casting the date as a varchar(12) cutting it off at the space to skip the time part then casting it back to a datetime
SELECT @StartDate = CAST(CAST(GETDATE() as varchar(11)) as datetime)

-- add 86399997 milliseconds to get a nice 23:59:59:997 datetime
SELECT @EndDate = DATEADD(MS,86399997,@StartDate)    

-- run to test
SELECT  @StartDate, @EndDate 

And you will get 2011-10-27 00:00:00.000 and 2011-10-27 23:59:59.997.

Now to prove that doing a search this way is better for performance we pick a large table, ensure there is an index on the date column we are using and write a test harness.

Remember to turn on the "Include Actual Execution Plan" option on, and make sure that we clean out all the cache buffers before running our test.

Using some simple datediff calculations to time each SELECT we can run the following code.


SET DATEFORMAT YMD
SET NOCOUNT ON

DECLARE @START DATETIME, 
 @END DATETIME,   
 @DUR INT
  
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE

SELECT @START = GETDATE()

SELECT Racedatetime
FROM RACES
WHERE DATEDIFF(DAY,Racedatetime,GETDATE())=0 -- only get data for current day

SELECT @END = GETDATE(),
 @DUR = DATEDIFF(MS,@START,@END)

SELECT  'Query 1 with non SARGABLE DATEDIFF function took ' + CAST(@DUR as varchar) + ' milliseconds to run'


SELECT @START = GETDATE()

-- hardcoding values for an example but put our code from above to get the start and end of the current day here
SELECT Racedatetime
FROM RACES
WHERE Racedatetime BETWEEN '2011-10-27 00:00:00:000' AND '2011-10-27 23:59:59:997'

SELECT @END = GETDATE(),
 @DUR = DATEDIFF(MS,@START,@END)

SELECT  'Query 2 with SARGABLE WHERE Clause took ' + CAST(@DUR as varchar) + ' milliseconds to run'

Viewing the execution plan for the query you can see the difference in execution plans between the two queries.

The Non SARGABLE Query that used the DATEDIFF made use of an INDEX SCAN to obtain the result as it had to loop through the dataset running the DATEDIFF function on each Racedatetime column to see if the result was 0 (today).

The SARGABLE Query didn't have to do this and made use of the correct INDEX and no function needed to be applied to each column.

You can see the benefit in the query cost between the two SELECT statements in the following screenshot of the execution plan.

Excution Plan Cost of Query

Statement one that used a NON SARGABLE DATEDIFF clause took 673 milliseconds to run (on a table with 567,031 records) and had a cost of 96% and the statement that used a SARGABLE BETWEEN clause took 30 milliseconds to run and had a cost of 4%

Results of Query

Hopefully you can see from this example why it pays to make your queries as optimal as possible.

For more SQL Tips you can read an old Top SQL Performance Tips article I wrote some time back which still has lots of useful tips on performance tuning your queries.


And for analysing bottle necks and problematic queries you should view my SQL performance tuning script which will identify 14+ different areas which could be improved from query plan re-use, fragmented indexes, high CPU, and missing indexes plus lots more.

Monday, 17 October 2011

Finding Depreciated SQL functions in SQL 2008

Depreciated SQL 2008 features

I have just come across this useful piece of SQL that shows you how many times you are using depreciated features in your SQL 2008 Database.

Run the SQL on your SQL 2008 database in a new query window to see how many times you are using the depreciated features.



SELECT *
FROM sys.dm_os_performance_counters
WHERE object_name = 'SQLServer:Deprecated Features'
 AND cntr_value > 0; 


Whilst these depreciated SQL features still might work in this version of SQL they might not in future releases so fixing them soon is a good idea.


Features that have been depreciated include the often used SET ROWCOUNT N.

For example instead of using SET ROWCOUNT 100 to limit an UPDATE or DELETE statement you would use the new UPDATE TOP(100) or DELETE TOP(100) statements instead.


Whilst the SQL statement for finding out the list of depreciated features is useful it doesn't actually show you where each of the depreciated features are being used in your code.

I don't know if this DMV only looks at stored procs or all queries passed through the system but using a find function like the following piece of SQL might be useful for hunting down depreciated code within the SQL database.


DECLARE @FindStr AS VARCHAR(500)

-- lower case our find string as we lower case the comments
SELECT @FindStr = LOWER('USER_ID')                                                                                                                      

SET NOCOUNT ON

SELECT DISTINCT NAME AS [NAME], 
  CASE  WHEN TYPE ='P' THEN 'PROCEDURE'
        WHEN TYPE IN('FN', 'IF','TF') THEN 'FUNCTION'
    END AS OBJECTTYPE
FROM SYSCOMMENTS as comm
JOIN SYSOBJECTS as obj 
   ON  comm.id = obj.id and obj.type IN ('P', 'FN', 'IF', 'TF')
WHERE LOWER(TEXT) LIKE '%' + LTRIM(RTRIM(@FindStr)) + '%'



You can find a list of all the depreciated features in SQL 2008 and ways to fix them on the Technet Microsoft SQL Website.

If anyone knows of any other ways to hunt down the depreciated features within the SQL 2008 database please let me know.

Thursday, 6 October 2011

Twitter Hash Tag Scanner SEO Application

Introducing the first version of my Twitter Hash Tag Scanner Application

The Strictly HashTag Hunter is a Windows form application that allows you to find the most relevant HashTags and Twitter accounts for a variety of specified search terms and keywords.

This application is ideal for people who have just created a Twitter account and want to analyse their own site specific keywords to find #HashTags they should be following or using themselves.

For instance you might have a blog or site that uses an AutoBlogging tool like my Strictly TweetBot Wordpress Plugin and you might want to set up various AutoTweets with relevant HashTags that are related to certain keywords and content snippets.

This tool also helps you find the most important Twitter accounts that you should be following as it analyses those people that are using the keywords or sentences that you enter on Twitter at that point in time to find the most popular HashTags related to those words as well as the accounts that are using them the most.

Obviously the time of day you run your scan will affect the results as different people Tweet at different times of the day but you will see from the results which Twitter accounts have the most followers and therefore worth following for your own account.

The primary aim of this tool is to help you save time trying to work out which #HashTags to use for your own Tweets as well as working out which @accounts to follow for your own Twitter account.

The Strictly Twitter Hash Tag Hunter is built as a windows application that runs on your own desktop and it hooks into Twitters API to obtain the results. It is perfect for SEO and Social Media analysts as well as people with a new Twitter account who don't know which hash tags and accounts they should be following to make an impact on the social scene.



Screen 1 shows how you enter one or more search terms that you want to find information for. These terms can be anything but if you are looking to utilise this tool with my Strictly TweetBot Wordpress Plugin then you should be looking to find the #HashTags and @Accounts to follow on Twitter related to the key terms your website is based on.

For example if you were running a site about Horse Racing and wanted to find out which Twitter @Accounts to follow and which #HashTags to use in your Tweets you would enter a number of search terms like so:

Horse Racing
Kempton Park
fromthestables.com
Free Racing Tips 
Twitter HashTag Hunter Start up screen
Enter each keyword or search term on it's own line.

Once you have entered each term on it's own line you click the "Search" button and the Scanner gets to work analysing your keywords and finding related Twitter information.

For each search term and keyword it will scan the Twitter API for those words looking for the most popular #hashtags that are related to those keywords.

It will also find the Twitter accounts that make the most use of these terms before ordering the accounts by the number of followers each account has and the hash tags by the number of times they are referenced by those accounts.

On completing the Scan

Screen 2 shows the most popular hash tags found for the search terms that were entered.
Twitter HashTag Completion Screen
The most popular hash tags found for the entered search terms and keywords.

Screen 3 shows the most followed Twitter accounts that used the terms you searched for.
Twitter HashTag Completion Screen
The most followed Twitter accounts for the entered search terms and keywords.

Following Accounts or Hash Tags

Once the Twitter Scan has completed and you have looked at the results you can simply click on the Account or Twitter Link column value to open up the desired URL in your default browser.

Screen 4 shows you selecting the desired Account you want to examine on Twitter.

Selecting a Twitter Account
Selecting an account to examine


Screen 5 shows the http://twitter.com page opening in your browser where you can decide whether or not the account or hash tag is worth following.

Folllowing the selected Twitter Account
Viewing the account in Twitter and following them


If you are already logged into Twitter at the time then it's just a simple matter of clicking the "Follow" button in the top right of the screen and your own Twitter account will now be following the account you opened.


About the Twitter Hash Tag Scanner Application

The application is a multi-threaded standalone executable Windows application and it has been built with users and Twitter in mind so that the Twitter API is not overloaded and abused and that you can continue to get all the information you need from their service.

A progress bar keeps you updated with the amount of scans it has carried out as well as the number of accounts and hashtags it has already found matching your search terms.

If for whatever reason Twitter blocks your requests (for example if you were hammering their API with dozens of search terms in one scan) then the application will slow down the amount of the requests it makes and increase the delays between requests. It also has some built in methods for bypassing certain blocking methods as well as the ability to access the information from other sources.

I am hoping to expand this tool over the years and I have had great feedback from both novice users who have found it very useful in deciding who to follow when they first start to use Twitter as well as SEO experts who utilise social media and Twitter all the time for marketing purposes.

As an introductory offer I am offering this application for the same price as a small donation of only £10.00 and you can buy this application from my application order page.

Tuesday, 4 October 2011

Find position of a string within a paragraph using a Regular Expression

How to find a string within another string using a Regular Expression instead of strpos or stripos

I was writing a piece of code in PHP the other day where I had to find a snippet of text within another longer piece of text (e.g an article) that contained a word. I then wanted to take X number of characters from that first point and return a snippet that didn't cut off the last word in the sentence.

At first I was using the PHP functions strpos and stripos but these don't allow you to use Regular Expressions as the search term (needle in the haystack as PHP.net calls the parameters) and therefore it meant that I was returning mismatches due to the search term being contained within other words.

E.G if I was looking for the word wool it would match woollen.

Therefore the answer was to use a custom function that made use of preg_match and a non greedy capture group at the beginning of a pattern that could be passed to the function (without delimiters).

The function is below



/**
 * Function to find the first occurence of a regular expression pattern within a string
 *
 * @param string $regex
 * @param string $str
 * @param bool $ignorecase
 * @return variant
 */
function preg_pos( $regex, $str, $ignorecase ) 
{ 
 // build up the RegEx wrapping it in @ delimiters
 $pattern = "@^(.*?)" . $regex . "@" . ($ignorecase===true ? "i" : "");

 if( preg_match( $pattern, $str, $matches ) ) {
  return strlen( $matches[ 1 ] ); 
 }

 return false; 
} 


As you can see the pattern needs to be passed in without delimiters e.g instead of /\bwool\b/ or @\bwool\b@ just pass in \bwool\b.

I then add a capture group to the beginning that is non greedy so that it finds the first match from the start of the input string ^(.*?) and then if the pattern is found I can do a strlen on the matching group to get the starting position of the pattern.

If you want the pattern to be case-sensitive then you can just pass in TRUE or FALSE as the extra parameter and the ignore flag will be added to the end of the pattern.

An example of this code being used is below. The code is looping through an array of words looking for the first match within a longer string (some HTML) and then taking 250 characters of text from the starting point, ensuring the last word is a whole word match.


// find first occurence of any of the terms I am looking for and then take 250 characters from the first word
// ensuring I get a whole word at the end

$a = explode(" ",$terms);
foreach($a as $w){

 // skip empty or small terms

 if(!empty($w) && strlen($w) > 2){
   
  // get the position of the word ensuring its not within another word - using \b word boundary - notice no RegEx delimiters @regex@ or /regex/
  // also ensure any special characters within the word are delimited to prevent a mismatch
  $pos = preg_pos( "\b" . preg_quote($w) . "\b", $html, true ) ;

  // if pos is false then its empty otherwise 

  if($pos !== false){

   // found the word take 250 chars from the first occurrence

   $text = substr($html, $pos, 250);
   
   // roll back to last space before our last word to ensure we don't get partial words 

   $text = substr($text, 0, strrpos($text," "));
   
   // now we have found a term exit
   break;
  }
 }
}


Also remember to wrap your word in preg_quote so that any special characters that are used by the Regular Expression engine e.g ? . + * [ ] ( ) { } etc are all characters that need to be escaped properly.

I found this function quite useful.

Monday, 26 September 2011

Regular Expression functions for reformatting content for Wordpress

Using RegEx to reformat Wordpress content

If you read my blog you will know I am "Mr Automate" and I specialise in scraping, reformatting, spinning and automatically posting content to Wordpress, Twitter and other sites.

I use Wordpress for a couple of my blogs and because the well known plugin WP-o-Matic has stopped being supported and the only replacements are paid for plugins like WP Robot I have basically written my own custom version to handle my automated content imports.

A couple of useful Regular Expressions you might like to use which I find particular helpful are below.

I am using PHP as that is the language Wordpress is written in but the core Regular Expressions should be easily transferable between languages.


1. Removing IFRAMES

As YouTube has moved from it's old OBJECT / EMBED nested tag soup to an IFRAME many feeds containing videos will now contain IFRAME's which need to be handled correctly as we all know that viruses can be transfered through rouge IFRAME content.

If you are just importing content from a feed without parsing and checking it then just imagine if for whatever reason the feed you are using one day contains an IFRAME with a SRC pointing to a dodgy virus infected URL. If you don't sanitize the content you will be inserting this dangerous content into your own blog for your visitors to become infected by.

Although most modern browsers and virus checkers are good at picking up on potential malicious URL access it is always wise to run your own checks using a white list of "allowed" domains rather than a "black list" of banned ones.

This code is a basic example of looping through all the IFRAMES on a page, checking their SRC tag for a known white list of acceptable domains and then replacing any IFRAME's that don't match.

// match all XHTML IFRAME tags where $content holds your HTML
$videocount = preg_match_all("@(<iframe[\s\S]+?<\/iframe>)@",$content,$videos,PREG_SET_ORDER);

foreach($videos as $video){

$object = $video[1];

// only allow certain domains e.g youtube, dailymotion, vimeo, cnn, fox, msnbc and the bbc
if(!preg_match("@src=['\"]http:\/\/(www\.)?(?:youtube|dailymotion|vimeo|cnn|fox|msnbc|bbc)\.@i",$object)){

// replace the IFRAME as we don't know where it is pointing to

$content = preg_replace("@" . preg_quote($object) . "@","",$content);
}

}


2. Re-sizing images

I have already written an article about how you can use regular expressions and a callback function to reformat videos or images so that they fit your template and this follows that theme.

PHP like Javascript and other decent languages offers the ability to create anonymous (unnamed) functions on the fly to be used as callbacks in regular expression replace functions.

Passing a function as a parameter within another function is known as a lambda function and this is very useful in certain situations such as parsing HTML where you want to remove illegal tags and attributes.

This example is pretty niche to my own needs but the idea is useful and can easily be converted to your own reformatting needs.

Basically I show little Google Adsense box adverts at the top right of all my articles and as these are floated to the right I don't want any images within my articles that are less than the size of my content area (minus the size of the advert) to also be aligned to the right as this causes a horrible looking mess.

Therefore I use this code to ensure that the Wordpress class "alignleft" is always inserted on the first image in any article but only if the width of the image is less than 350px. It doesn't matter if the image isn't right at the top of the article (although they usually are) but it helps to ensure a smooth flow and layout.

The Preg_Replace function in PHP is nice in that you can limit the number of replacements to one or more if you so wish and in this case I only want to reformat the first image. You will see that the lambda function I use also contains multiple other regular expressions and there is nothing stopping you from writing complex code within these anonymous functions.

$content = preg_replace_callback("@(<img[^>]+?class=['\"])([\s\S]*?)(['\"][\s\S]*?>)@i",
create_function(
'$matches',
'preg_match("@width=[\'\"](\d+)[\'\"]@",$matches[0],$widthmatch);
$chk=true;
if($widthmatch){
if(is_numeric($widthmatch[1])){
if($widthmatch[1] >= 350){
$chk = false;
}
}
}
if($chk && !preg_match("@alignleft@",$matches[2])){
$res = $matches[1] . trim( preg_replace("@alignright@","",$matches[2]) . " alignleft") .$matches[3];
}else{
$res = $matches[1] . preg_replace("@alignright@","",$matches[2]) .$matches[3];
}
return $res;'),$content,1);



As you can see the lambda anonymous function carries out the following logic

  • It looks for the first image that has a class attribute on it and passes the match collection into the anonymous function.
  • It then checks the image for a width attribute using another regular expression.
  • If found it then checks that the value of the width attribute is numeric and if so it tests to see if the value is over my allowed limit of 350px setting a flag if so.
  • I then use another regular expression to test for the existence of the standard Wordpress class "alignleft" and if it isn't found then I add it in making sure to replace any "alignright" class if it exists.
  • The lambda function then returns the new reformatted content of the image which is then used as the replacement value in the Preg_Replace function.
Just two useful examples which can be expanded upon by anyone interested in Regular Expressions and AutoBlogging.

Remember if you require any custom plugin work for Wordpress, including special reformatting functions, scraping code or content automation then contact me for a quote.

Wednesday, 31 August 2011

Would you switch back to IE9 now that they have finally made a good browser?

Now that IE9 is standards compliant and actually good will you switch back to using it?

A few months ago I wrote an article about why I hated debugging in IE8 which revolved around their single threaded event model (window.event) and the delay it took before debug messages were outputted to the console.

A member of the Microsoft team commented that he had run the test case I had created in IE9 and that it had run perfectly. No high CPU, no frozen console, no high memory usage or other performance problems at all.

As you cannot install IE9 on WinXP, which is what I use on my home laptop (due to Vista being shite), I haven't had the pleasure of using Internet Explorer 9 a lot until I installed Windows 7 on my work PC.

I have to say that Windows 7 is actually a great operating system and I especially love the way it has incorporated many of the features of clean up and performance tuning tools like Tune Up Utilities and CCleaner into the main OS.

I also have to say that Internet Explorer 9 is the first IE browser I actually like.

Not only is it blindingly fast they have finally listened to their customers who have been complaining for years and made their JavaScript engine standards compliant.

Obviously this makes the Browser / Document mode spaghetti even harder to detect and I haven't been able to find a way as of yet to detect IE 9 browser mode running as IE 8 or 7 on 32 bit machines but that is not a major issue at all.

What I am wondering though is that now that IE9 is actually a good browser, with a good inbuilt development console and element inspection tools, how many developers will actually return to using it either as their browser of choice for surfing the web or for their primary development.

My browser development survey which I held before IE9 was released showed that developers would rather use Chrome or Firefox for surfing and would also always choose the development tools that those browsers bring than use IE 7 or 8.

I know that I changed from using IE to Firefox as my browser of choice some eons back and I changed from Firefox to Chrome for both surfing (speed is key) and developing (no need for Firebug or any other plugin) the other year when Firefox started to suffer major performance problems.

These performance problems are always either to do with having far too many plugins installed. Setting the browser up to check for new versions and any installed plugins on start up which cause long load times, Firebug issues and errors in the Chrome source. This is on top of all the constant problems that Flash seems to bring to any browser.

I use Chrome for surfing due to it's speed but if I leave a few tabs open all night that contain pages running Flash videos then by morning my CPU has flat-lined and memory has leaked like the Titanic.

This is more a problem with Flash than Chrome and I try to keep this browser for surfing as clean and as fast as possible by not installing plugins. Then if I need to hack around or do some hardcore web development I will use Firefox and all the great plugins when I need to.

I don't think I could really get hacking with Chrome alone as when I am really getting stuck into a site I need my key plugins e.g the web developer toolbar, hackbar, header modifier, Request and Response inspectors e.g HTTP Fox, Firebug, ColorZilla, YSlow and all the Proxy plugins that are available.

However for basic development, piping out messages to the console, inspecting the DOM and checking loaded elements and their responses then Chrome has everything Firebug has without all the JavaScript errors.

In fact it's surprising how many developers don't even realise how much is possible with Chrome's inbuilt developer tools, speed tests, header inspectors, element inspectors and DOM modifiers and other great debugging tools that used to be the preserve of Firebug alone.

Which leads me back to IE9.

Yes it's fast and it's developer tools are okay with options to clear cookies, disable JavaScript, inspect the DOM and view the styles but it's no Chrome or Firebug yet;

Therefore what I want to know is how many people (and by that I mean developers) will switch back to using Internet Explorer 9 for pure web surfing?

For sure it's fast, supports the latest CSS standards and all the great HTML 5 features that we have all seen with floating fish tanks and the like. But is this enough for another browser switch?

I have all the major browsers installed at work on Win7 including IE9, Firefox, Chrome, Opera, Safari, SeaMonkey and a number of others I cannot even remember the names of they are that obscure.

Before upgrading I even had Lynx and Netscape Navigator 4 installed as I used to be so anal that any code such as floating DIV's had to work in the earliest browsers possible. However I only use these browsers for testing and I stick to Chrome (for surfing) and Firefox (for hacking) which leaves little room for IE9 at the moment.

I know it's a good browser. It's standards compliant and it no longer crashes when I try to output anything to the console so why shouldn't I try it again.

Maybe one of my readers can help me decide on the browser of choice for surfing and development and how IE9 fits into that toolset.

Maybe I am missing some really cool features that I would use all the time but to be honest I am not into throwing sheep at randoms or constantly voting sites up and down and writing comments on them. Therefore for me to switch back from Chrome to another browser such as IE9 there has to be a really good reason for me to do so.

So does anyone have any reason why I should re-consider using IE9 at work for my web surfing or development?