Showing posts with label Links. Show all posts
Showing posts with label Links. Show all posts

Wednesday, 28 January 2015

NOFOLLOW DOES NOT MEAN DO NOT CRAWL!

NOFOLLOW DOES NOT MEAN DO NOT CRAWL!

By Strictly-Software

I have heard it said by "SEO Experts" and other people that to prevent excess crawling of a site you can add rel="nofollow" to your links and this will stop GoogleBOT from crawling those links.

Whilst on the surface of it this does seem to make logical sense, I mean the attribute value does say "nofollow" not "follow if you want" it isn't. BOTS will ignore the nofollow and still crawl the links if they want to.

The nofollow attribute value is not meant for blocking access to pages and preventing your content from being indexed or viewed by search engines. Instead, the nofollow attribute is used to stop SERPS like GoogleBOT from having any "link juice" from the main page leak out to the pages they link to.

As you should know Google still uses PageRank, even though it is far less used than in years gone by. In the old days it was their prime way of calculating where a page was displayed in their index and how one page was related to another in terms of site authority.

The original algorithm for Page Rank and how it is calculated is below.

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))


An explanation for it can be found here. Page Rank Algorithm Explained.

The perfect but totally unrealistic scenario is to have another site with a very high Page Rank value e.g 10 (the range goes from 1 to 10) and to have that sites high PR page (e.g their homepage) have a single link on it that goes to your site - without a nofollow value in the rel attribute of the link.

This tells the SERP e.g GoogleBOT that this high ranking site THINKS your site is more important than it in the great scheme of the World Wide Web.

Think of a pyramid with your site/page ideally at the top with lots of high PR pages and sites all pointing to it, passing their link juice upwards to your site. If your page then doesn't have any links on it at all then no link juice you have obtained from inbound links will be "leaked out".

The more links there are on a page the less PR value is given to each link and the less "worthy" your site becomes in theory.

So it should be noted that the nofollow attribute value isn't meant for blocking access to content or preventing content to be indexed by GoogleBOT and other search engines.



Instead, the nofollow attribute is used by sites to stop SERP BOTS like GoogleBOT from passing "authority" and PR value to the page it is linking to.

Therefore GoogleBOT and others could still crawl any link with rel="nofollow" on it.

It just means no Page Rank value is passed to the page being linked to.

Monday, 29 March 2010

My Hundredth Article

An overview of the last 102 articles

I really can't believe that I have managed to write 102 articles for this blog in the last year and a bit. When I first started the blog I only imagined writing the odd bit here and there and saw the site purely as a place to make public some of my more useful coding tips. I never imagined that I could output this amount of content by myself.

A hundred articles has come and gone pretty fast and as with all magazines, tv shows and bloggers stuck for an idea I thought I would celebrate my hundred and 2nd article by reviewing my work so far.

Recovering from an SQL Injection Attack

This was the article that started it all and it's one that still gets read quite a bit. It's a very detailed look at how to recover an infected system from an SQL Injection Attack and includes numerous ways of avoiding future attacks as well as quick sticking plasters, security tips and methods for cleaning up an infected database.

Linked to this article is one of my most downloaded SQL scripts which helps identify injected strings inside a database as well as removing them. This article was written after a large site at work was hacked and I was tasked with cleaning up the mess so it all comes from experience.

Performance Tuning Tips

I have wrote quite a few articles on performance tuning systems both client and server side and some of my earliest articles were on top tips for tuning SQL Databases and ASP Classic sites. As well as general tips which can be applied to any system I have also delved into more detail regarding specific SQL queries for tuning SQL 2005 databases.

Regarding network issues I also wrote an extensive how to guide on troubleshooting your PC and Internet connection which covered everything from TCP/IP settings to tips on the best tools for cleaning up your system and diagnosing issues. On top of that I collated a number of tweaks and configuration options which can speed up FireFox.


Dealing with Hackers, Spammers and Bad Bots

My job means that I have to deal with users trying to bring my systems down constantly and I have spent considerable time developing custom solutions to log, identify and automatically ban users that try to cause harm to my sites. Over the last year I have written about SQL Denial of Service attacks which involve users making use of web based search forms and long running queries to bring a database driven system to a halt. I have also investigated new hacking techniques such as the two stage injection technique, the case insensitive technique, methods of client side security and why its almost pointless as well as detailing bad bots such as Job Rapists and the 4 rules I employ when dealing with them.

I have also detailed the various methods of using CAPTCHA's as well as ways to prevent bots from stealing your content and bandwidth through hot linking by using ISAPI rewriting rules.

Issues with Browsers and Add-Ons

I have also tried to bring up to date information on the latest issues with browsers and new version releases and have covered problems and bugs related to major upgrades of Firefox, Chrome, Opera and IE. When IE 8 was released I was one of the first bloggers to detail the various browser and document modes as well as techniques for identifying them through Javascript.

I have also reported on current browser usage by revealing statistics taken from my network of 200+ large systems with regular updates every few months. This culminated in my Browser survey which I carried out over Christmas which looked at the browsers and add-ons that web developers themselves used.


Scripts, Tools, Downloads and Free Code

I have created a number of online tools, add-ons and scripts for download over the last year that range from C# to PHP and Javascript.

Downloadable Scripts Include:

SQL Scripts include:

Search Engine Optimisation

As well as writing about coding I also run a number of my own sites and have had to learn SEO the hard way. I have wrote about my experiences and the successful techniques I have found that worked in a couple of articles printed on the blog:
So there you go an overview of the last year or so of Strictly-Software's technical blog. Hopefully you have found the site a good resource and maybe even used one or two of the scripts I have posted. Let me know whether you have enjoyed the blog or not.

Monday, 30 November 2009

Changing all links and source attributes in the DOM

Working with hosted merchant payment solutions

If you have ever worked with hosted payment solutions such as SecPay (now PayPoint) and WorldPay you will have dealt with Callback pages which are pages containing server-side code e.g .NET, ASP, PHP etc and located on your webserver but are loaded up and displayed within the payment gateways secure domain.

This means that any relative links on images, stylesheets, scripts and anchors will be relative to the payment gateways domain and not your webserver. Therefore if you don't apply some code to correct these links the styles won't load and the links won't go anywhere apart from 404 error pages.

You could ensure that all your links are absolute anyway in which case you won't have a problem but often this isn't possible for numerous reasons. Therefore if you don't want to create a very basic minimal template page to use for your callback page to get round this issue you can use some client side Javascript to loop through all the relevant collections and change the links to reflect the true location of the files.

The following function is one that I use on my own system. It is called once the page loads and loops through the A, LINK, SCRIPT and IMG collections checking the current src or href attributes and makes sure any relative links are changed into absolute ones pointing to the true base URL (e.g your site) and not the payment gateway, and for absolute links that have already been resolved incorrectly it replaces the payments domain with the true domain. This ensures that all links point to absolute URI's that reference your site and not the payment gateway which has loaded the content to display on its own system.

If you are using server side code in your callback page then you can replace the top two parameters makeAbs.domain and makeAbs.directory that refer to the base URL and the Virtual directory that contains the callback page on your webserver with some code to dynamically populate those values. The full function code is below.
makeAbs = {
// the domain we want to reference
domain : "http://www.mysite.com",

// the virtual directory containing the file that will be referenced
directory : "/somedomain/subdomain/",

// function to modify the DOM call once page has loaded
ModifyDOM : function(){

// change Anchors
this.ChangeLocation("A","href");

// change CSS Links
this.ChangeLocation("LINK","href");

// change SCRIPT
this.ChangeLocation("SCRIPT","src");

// change IMG
this.ChangeLocation("IMG","src");

},

ChangeLocation : function(tag,att){

var o,n,h,e=document.getElementsByTagName(tag);
for(var i=0,l=e.length;i<l;i++){
o = (att=="href")?e[i].href:e[i].src;

// if current href/src is blank then skip
if(o && o!=""){

// if its a relative link
if(!/^https?:\/\//.test(o)){

// if its just a filename then we need the domain + virtual to create absolute URL otherwise just need our domain
n = ((o.substring(0,1)=="/") ? this.domain : this.domain + this.directory) + o;

// if its an absolute URL make sure the payment servers domain is replaced with our own in case relative links
// have already been associated with the wrong location
}else{
n = o.replace(document.location.protocol + "//" + document.domain,this.domain);
}

// now reset with our new value
if(att=="href"){
e[i].href = n
}else{
e[i].src = n;
}
}
}

}
}


The code can be downloaded as a file from the following location: makeAbsolute.js