This website uses cookies. By continuing to browse this site you are agreeing to our use of cookies

Wednesday, 28 January 2015

NOFOLLOW DOES NOT MEAN DO NOT CRAWL!

NOFOLLOW DOES NOT MEAN DO NOT CRAWL!

By Strictly-Software

I have heard it said by "SEO Experts" and other people that to prevent excess crawling of a site you can add rel="nofollow" to your links and this will stop GoogleBOT from crawling those links.

Whilst on the surface of it this does seem to make logical sense, I mean the attribute value does say "nofollow" not "follow if you want" it isn't. BOTS will ignore the nofollow and still crawl the links if they want to.

The nofollow attribute value is not meant for blocking access to pages and preventing your content from being indexed or viewed by search engines. Instead, the nofollow attribute is used to stop SERPS like GoogleBOT from having any "link juice" from the main page leak out to the pages they link to.

As you should know Google still uses PageRank, even though it is far less used than in years gone by. In the old days it was their prime way of calculating where a page was displayed in their index and how one page was related to another in terms of site authority.

The original algorithm for Page Rank and how it is calculated is below.

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))


An explanation for it can be found here. Page Rank Algorithm Explained.

The perfect but totally unrealistic scenario is to have another site with a very high Page Rank value e.g 10 (the range goes from 1 to 10) and to have that sites high PR page (e.g their homepage) have a single link on it that goes to your site - without a nofollow value in the rel attribute of the link.

This tells the SERP e.g GoogleBOT that this high ranking site THINKS your site is more important than it in the great scheme of the World Wide Web.

Think of a pyramid with your site/page ideally at the top with lots of high PR pages and sites all pointing to it, passing their link juice upwards to your site. If your page then doesn't have any links on it at all then no link juice you have obtained from inbound links will be "leaked out".

The more links there are on a page the less PR value is given to each link and the less "worthy" your site becomes in theory.

So it should be noted that the nofollow attribute value isn't meant for blocking access to content or preventing content to be indexed by GoogleBOT and other search engines.



Instead, the nofollow attribute is used by sites to stop SERP BOTS like GoogleBOT from passing "authority" and PR value to the page it is linking to.

Therefore GoogleBOT and others could still crawl any link with rel="nofollow" on it.

It just means no Page Rank value is passed to the page being linked to.

Sunday, 25 January 2015

Returning BAD BOTS to where they came from

Banning BAD BOTS to where they came from

By Strictly-Software

Recently in some articles I mentioned some .htaccess rules for returning "BAD BOTS" e,g crawlers you don't like such as IE 6 because no-one would be using it anymore and so on.

Now the rule I was using was suggested by a commenter in a previous article and it was to use the REMOTE_ADDRESS IP parameter to do this.

For example in a previous article (which I have now changed) about banning IE 5, 5.5 and IE 6, I originally suggested using this rule for banning all user-agents that were IE 5, 5.5 or IE 6.

RewriteRule %{HTTP_USER_AGENT} (MSIE\s6\.0|MSIE\s5\.0|MSIE\s5\.5) [NC]
RewriteRule .* http://%{REMOTE_ADDR} [L,R=301]

Now this rewrite rule uses the ISAPI parameter {REMOTE_ADDR} which holds the originating IP address from the HTTP request to send anyone with IE 6 or below back to it.

It is the IP address you would normally see in your servers access logs when someone visits.

Problems with this rule

Now when I changed the rules on one of my own sites to this rule and then started testing it at work for a work site by using a user-agent switcher add-on for Chrome I ran into the problem that every time I went to my own site I was sent back to my companies gateway router page.

I had turned the switcher off but for some reason either a bug in the plugin, a cookie or session variable must have caused my own site to believe I was still on IE 6 and not the latest Chrome version. So everytime I went to my site with this rule I was kicked back to my companies gateway routers page.

Therefore after a clean up and a think and talk with my server techie guy he told me I should be using localhost instead of the REMOTE_ADDR IP address .The reason was that a lot of traffic, hackers, HACKBOTS, Spammers and so on would be hitting the Gateway page for their ISP for potential hacking,

These ISP's might get a but pissed off with your website sending their gateway routers page swathes of traffic that could potentially harm them,

Therefore to prevent getting letters in the post that you are sending swathes of hackers to your homes or phones ISP gateway - as a lot of phones or tablets use proxies for their browsers anyway - is to send them back to their own localhost or 127.0.0.1.

Also instead of using a 301 permanent redirect rule you should use a 302 temporary redirect rule instead as that is the more appropriate code to use,

Use this rule instead

Therefore the rule I now recommend for anyone wanting to ban all IE 5, 5.5 and 6 traffic is below.

RewriteRule %{HTTP_USER_AGENT} (MSIE\s6\.0|MSIE\s5\.0|MSIE\s5\.5) [NC]
RewriteRule .* http://127.0.0.1 [L,R=302]

This Rewrite rule bans IE 5, 5.5 and IE 6.0 and sends the crawler back to the localhost on the users machine with a 302 rewrite rule. You can obviously add other rules in with BOTS and SQL/XSS injection hacks as well

This is a more valid rule as it's not a permanent redirect for the traffic such as if a page has changed it's name. Instead it's down to an invalid parameter or value in the HTTP Request that the user is being redirected to the new destination with a redirect.

If the user changed it's user-agent or parameters then it would get to the site and not be redirected with a 301 OR a 302 status code but instead get a 200 OKAY status code.

So remember, whilst an idea might seem good at first until you fully test it and ensure it doesn't cause problems it might not be all that it seems.

Wednesday, 14 January 2015

ETSY SHOP OPEN FOR BUSINESS!

ETSY SHOP OPEN FOR BUSINESS!

By Strictly-Software

My Etsy shop is OPEN again - if you run a WordPress site and want some tools to automate your system then check out: https://www.etsy.com/uk/shop/StrictlySoftware

I didn't know the items in my shop EXPIRED after so long so the shop was empty to viewers for the last month and a bit but now you can buy your tools from etsy or my own site for plugins: http://www.strictly-software.com/plugins (please click on some adverts and help me raise some cash)

Also my facebook page: https://www.facebook.com/strictlysoftware has information about these tools that you should read if you have purchased any of them.

It has help articles, guides on support, possible issues and fixes and much more - feel free to comment and like the page!

The basic idea behind these plugins is:

Run a site all year, 24/7 without having to do anything apart from some regular maintenance like cleaning tags that are not used very much and OPTIMIZING your database table.

So an RSS / XML feed contains your content (e.g news about something) and this goes into WordPress at scheduled times (Cron jobs or WebCrob jobs) using WordPress plugins like RSSFeeder or WP-O-Matic then as the articles are saved Strictly AutoTags adds the most relevant tags to it by using simple pattern matching like finding the most frequently used "important" words in the article e.g words in the Title, Headers, Strong tags or just Capitalised Words such as names like John Smith.

This means if John Smith became famous over night you wouldn't have to add a manual tag in for him or wait for a 3rd party plugin to add in the word to their own database so that it can be used.

Then once your article is tagged. You can choose to have the most popular tags converted into links to tag pages (pages containing other articles with the same tag) or just bold them for SEO - or do nothing.

You can set certain tags to be "TOP TAGS" which will rank them higher than all other tags. These should be tags related to your site e.g a bit like the old META Keywords.

You can also clean up old HTML, convert text tags to real clickable ones and set up a system where if a tag such as ISIS is found the tag Middle East is used instead. This is all explained on the Strictly AutoTags page on my site.

Then if you also purchase Strictly Tweet BOT PRO as well you can use those new post tags as #hashtags in your tweets and you can set your system up to either tweet to multiple twitter accounts with different formats and tags or tweet to the same account with different wording dependant on the wording in your article.



E.G if your article was about the Middle East wars you could say only post the Tweet if the article contains this word "Middle East" OR "Syria" or you could say only post if it contains the words "ISIS" AND "War".

The TweetBOT then lets you ensure the post is cached (if you are using a WordPress Caching System) by making it live first and making an HTTP request to it so it gets cached. Then it waits a custom defined number of seconds before any Tweets are sent out.

You can then specify a number of seconds between each Tweet that is sent out to prevent Twitter Rushes e.g. Where 50 BOTS all hit your site at the same time.

You can ensure no Tweets are sent out if they contain certain words, add tracking links e.g Google before the link is minimised by Bit.ly.

A simple PIN number process lets you connect your Twitter Account to your TweetBOT Account.

A dashboard keeps you informed of recent Tweets sent out, any errors from Twitter like "duplicate tweet", or if your Bit.ly account isn't working.

Plus a test button lets you test the system without sending a Tweet by taking the last post, running your settings through it such as shortening the link and post and checking all Twitter accounts are working and connected properly.

If you then link your Twitter account up to your Facebook page like I have with my Horse Racing site http://www.ukhorseracingtipster.com/ and my Twitter account @ukhorseracetips with my Facebook page facebook.com/Ukhorseracingtipster you get social media and SEO impact for free!





Check out the new live shop on Etsy for plugins and coupons if you need me to set the plugin up for your site:https://www.etsy.com/uk/shop/StrictlySoftware

You may need help due to your sites special settings or requirements so a coupon will let you help you set it up correctly for you.