Category Archives: Search Engine Optimization

How to control or stop Search Engines to crawl your Website using Robots.txt

Control or Stop Search Engines to crawl your Website using Robots.txt

Website owner can instruct search engines on which pages to crawl and index, They can use a robots.txt file to do so.

A search engine robot want to visit a website URL, say http://www.domainname.com/index.html (as defined in directory index)before visit, it first check http://www.domainname.com/robots.txt, and looks to see if there are specific directives to follow. Let’s suppose it finds the following code in the robots.txt.

User-agent: *
Disallow: /

 

The “User-agent: *” means this is a directive for all robots. The * symbol means all.
The “Disallow: /” tells the robot that it should not visit any pages on the site.

 

Important considerations to use robots.txt file.

1) Robots that choose to follow the instructions try to search this file and read the instructions before visiting the website.If this file doesn’t exist web robots assume that the web owner wishes to provide no specific instructions.

2) A robots.txt file on a website will function as a request that specified robots ignore specified files or directories during crawl.

3) For websites with multiple sub domains, each sub domain must have its own robots.txt file. If domainname.com had a robots.txt file but sub.domainname.com did not, the rules that would apply for domainname.com would not apply to sub.domainname.com.

4) The robots.txt file is available to the public to view. Anyone can see what sections of your server you don’t want robots to use.

5) Robots can ignore your /robots.txt.

6) Your robots.txt file should be in the root for your domain. In our server’s configurations this would be the public_html folder in your account. If your domain is “domainname.com” then the bots will look for the file path http://domainname.com/robots.txt.If you have add-on domains and want to use a robots.txt file in those as well you will need to place a robots.txt file in the folder you specified as the root for the add-on domain.

 

Some examples:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /tmp/
Disallow: /private/

 

In the example, web-owner told ALL robots (remember the * means all) to not crawl four directories on the site (cgi-bin, images, tmp, private), if you do not specify files or folders to be excluded it is understood the bot then has permission to crawl those items.

 

To exclude ALL bots from crawling the whole server.
User-agent: *
Disallow: /

 

To allow ALL bots to crawl the whole server.
User-agent: *
Disallow:

 

To exclude A SINGLE bot from crawling the whole server.
User-agent: BadBot
Disallow: /

 

To allow A SINGLE bot to crawl the whole server.
User-agent: Google
Disallow:

User-agent: *
Disallow: /

 

To exclude ALL bots from crawling the ENTIRE server except for one file.
🙂 Tricky since there’s no ‘allow’ directive in the robots.txt file. What you have to do is simply place all the files you do not want to be crawled into one folder, and then leave the file to be crawled above it. So if we placed all the files we didn’t want crawled in the folder called SCT we’d write the robots.txt rule like this.

 

User-agent: *
Disallow: /SCT

 

Or you can do each individual page like this.
User-agent: *
Disallow: /SCT/home.html

 

To create a Crawl Delay for the whole server.
User-agent: *
Crawl-delay: 10

 

If you wish to block one page, you can add a <meta> robots tag.
<meta name=”robots” content=”” />

You can get more knowledge about robots.txt file from http://www.robotstxt.org/

 

Force your browser to open Google.com instead of Google’s localized domain

How to use Google with .com instead of country extension or force your browser to open google.com?

 

You have observed that whenever you open http://www.google.com, in your browser’s address bar, you are redirected to the local Google domain of your country (http://www.google.co.in for India, .pk for Pakistan, .lk for Sri Lanka etc).

 

Google web search is customized for all countries and regions across the world.
You can view all the countries domain or can view Google Search in different languages by Google Language Tool.

 

How can you force browser to open google.com?

This is really annoying if you are traveling or staying in a country which uses language other then English.Suppose, if you are in Japan then typing http://www.google.com will redirect you to http://www.google.co.jp, with full interface in Japanese 🙁

 

Another important thing is that Google’s search result depends upon your country (Google try to provide most relevant result).So if you are using Google from India (http://www.google.co.in), you will see slightly different results then the results from Google for US (http://www.google.com).The Adwords ads are completely different in different countries search pages.

 

So, for some reason if you want to use http://www.google.com instead of your local Google, then you can do it by a simple step.Just use the URL http://www.google.com/ncr in your browser’s address bar.Here ncr stands stands for No Country Redirect.Once you open this URL in your browser, it will create a cookie in web browser, which will stop redirection from Google.com to local Google domain.

 

To start the redirection again according to your country, just do clear your browser cookies.

 

Configure apache server to parse HTML file as PHP using .htaccess

How to force apache to parse html files as php?

 

Below are some reasons why people like to parse HTML file as PHP.

  1. Search engines seem to more favor web pages that have .html extension instead of dynamic .php extensions.
  2. Converting an old static website into a dynamic website and worry about losing page ranks.
  3. Security reasons…You don’t want visitors to know what scripting language you using.
 
 
Parsing HTML files as PHP is easily accomplished using .htaccess file.Generally webhost run two versions of PHP as PHP4 & PHP5 usually have a PHP5 handler.
 
 
The below code will parse all .html and .htm files as PHP.
 
AddHandler application/x-httpd-php5 .html .htm
 
 
On most other webhosts
 

AddType application/x-httpd-php .html .htm

 

or,if the AddType directive does not work, you can use the AddHandler directive as follows

 

AddHandler application/x-httpd-php .html .htm or
AddHandler x-httpd-php .html .htm

 

Some webhosts, require both directives, as below

 

AddType application/x-httpd-php .htm .html
AddHandler x-httpd-php .htm .html

 

You can also try this multi-line approach which uses the SetHandler directive

 

<FilesMatch “\.(html|htm|php)$”>
SetHandler application/x-httpd-php
</FilesMatch>

 

or by

 

<FilesMatch “\.(html|htm|php)$”>
SetHandler application/x-httpd-php5
</FilesMatch>

 

Drawback
Websites using this approach are slower than websites that simply use PHP extension files; So It is a great solution only for small website 🙂

Meta refresh redirect (tag) and Search Engines

The meta refresh tag or meta redirect is a tool for reloading and redirecting web pages. Meta refresh tag is easy to use, but most don’t know that innocent use of that tag may significantly lower your page rank or even get your pages banned in some of search engines.

 

The meta tag belongs within the <head> of your HTML document. When used to refresh the current page, the syntax looks like this:

 

[php]<meta http-equiv="refresh" content="600">[/php]

 

<meta> – This is the HTML tag. It belongs in the <head> of your HTML document.

 

http-equiv=”refresh” – This attribute tells the browser that this meta tag is sending an HTTP command rather than a standard meta tag. Refresh is an actual HTTP header used by the web server. It tells the server that the page is going to be reload or redirect somewhere else.

 

content=”600″ – This is the amount of time, in seconds, until the browser should reload the current page.

 

However, when using this HTML redirect code, please ensure that you don’t use it to trick the Search Engines, as this could get your website banned. It is always best to work hard and learn quality ways in which to drive traffic to your web site.

 

Meta refresh tags have some drawbacks

 

  • Meta refresh redirects have been used by spammers to fool search engines. So search engines remove those sites from their database. If you use a lot of meta refresh tags to redirect pages, the search engines may decide your site is spam and delete it from their index. It’s better to use a 301 Server Redirect instead.
  • If the redirect happens quickly (less than 2-3 seconds), readers with older browsers can’t hit the “Back” button. This is a usability problem.
  • If the redirect happens quickly and goes to a non-existent page, your readers won’t be able to hit the “Back” button. This is a usability problem that will cause people to completely leave your site.
  • Refreshing the current page can confuse people. If they didn’t request the reload, some people can get concerned about security.
Alternatives of META Refresh  or best use of Meta Refresh Tag
  • Since search engines constantly change their algorithms and spam policies, a tag that may be fine one week could drop you to the bottom of the rankings the next. It’s best to not use the META refresh attribute on pages you want indexed, but if you do, set it to at least 10 seconds.
  • Server side redirection is a better way to ensure that visitors can still find your Web pages after you make changes because there are no spamming penalties associated with it. The most common use of server site redirects is to send visitors to a custom error document when they enter an invalid URL.
  • Although it’s a safer, more elegant solution, server side redirection is more technically demanding than using META tag or JavaScript redirects. But it won’t get you banned either! You’ll need to edit your .htaccess file on your server.
  • If you’re using a web host instead of running your own server, then the server administrator will probably have to make the change for you. Contact your Web host to see if they offer that service.

 

Use non-standard font in web pages

Use non-standard font in web pages

 

Normally your website visitors can only see the fonts that are already installed on their system.

So if you use a font(non-standard) that is not installed on site visitor’s computer then his browser will show some other font that is on their computer. That’s why when you are defining a font for an element (such as <span>,<div>) you often specify multiple fonts so that if your preferred font is not available, then CSS(Cascading Style Sheets) file should use the available alternatives.

 

font-family: verdana, sans-serif;
Here verdana is preferred and sans-serif is alternative.

 

This can be really annoying if you want to use a nice font in your website.

 

Conventional way of using custom fonts for headings and logos etc.is creating the text in a graphic editor and then using the image file.From the perspective of SEO this is not appropriate solution,you must use text as much as possible.

 

Now there is a way available in CSS that lets you use custom fonts, downloadable fonts on your website. You can download the font of your preference, let’s say copse_font.ttf, and upload it to your server where your website is hosted.

 

Then from within your CSS file (or wherever you are defining your styles) you have to refer to that custom font in the following manner.

 

@font-face {
font-family: copse_font;
src: url(‘copse_font.ttf’);
}

 

span.custom_font{
font-family: copse_font; /* no .ttf */
}

 

You can get some free fonts file from here.

 

http://www.webpagepublicity.com/free-fonts.html
http://www.1001freefonts.com/fonts/afonts.htm

 

 

There is another option to use custom fonts in your site is via Google API. However, not so many fonts available so far in Google Font Directory.

 

 

You can go there by the link below.
Google Web Fonts API

 

What is text-based web browser?

Most of you know what is web browser – it’s a program/software by which you surf the internet, can view websites and web pages, send emails etc. 

 

Browsers we are familiar with (Internet Explorer, Firefox, Google Chrome, Apple Safari, Opera, Camilla) have a good interface and are called graphical browsers because they, display all the contents of a web page including text, images, videos and flash.

 

What is a text web browser?
A text web browser is just like a graphical browser, lets you surf the website. One big difference is, the complete absence of a nice looking interface.You have to use keys like the tab and arrows and enter key (instead of mouse) and opening a website shows you only text matter – no images not any other flash media and colours 🙂
It displays only the text on a web site with only links.

 

Importance and use of text web browser
A text web browsers is a very good way to check how a search engine bot views/reads your website page content. These service also available for free on the web, If you don’t want to download and install a text based web browser on your computer.

 

Which one is good in text browser?
Lynx was the first text based web browser and is good till now. It is a free program that can be installed on multiple operating systems (its cross-platform) including Windows, Mac and Unix/Linux. There are several other such programs like Emacs/W3, Edbrowser, ELinks, W3M. Lynx has been around since about 1992 and can be downloaded from Lynx homepage.

 

Google has been the most popular search engine and people always strive hard to have their sites rank high on its search results pages. To help developers with the various aspects of their web sites, Google has posted tons of articles and videos. They’ve also put up a Webmasters Tools section which has many utilities that can assist people in improving their web sites. One of them is the “Fetch as Googlebot” located under “Diagnostics” through which you can check how individual page appear to Google’s crawler.

 

“Fetch as Googlebot” is not a text browser :). However, it’s a great tool that can inform problems/bugs with in your site. Webmaster Tools site is free and you only need a free Google Account to access all of its features.