How to control or stop Search Engines to crawl your Website using Robots.txt

Control or Stop Search Engines to crawl your Website using Robots.txt

Website owner can instruct search engines on which pages to crawl and index, They can use a robots.txt file to do so.

A search engine robot want to visit a website URL, say http://www.domainname.com/index.html (as defined in directory index)before visit, it first check http://www.domainname.com/robots.txt, and looks to see if there are specific directives to follow. Let’s suppose it finds the following code in the robots.txt.

User-agent: *
Disallow: /

 

The “User-agent: *” means this is a directive for all robots. The * symbol means all.
The “Disallow: /” tells the robot that it should not visit any pages on the site.

 

Important considerations to use robots.txt file.

1) Robots that choose to follow the instructions try to search this file and read the instructions before visiting the website.If this file doesn’t exist web robots assume that the web owner wishes to provide no specific instructions.

2) A robots.txt file on a website will function as a request that specified robots ignore specified files or directories during crawl.

3) For websites with multiple sub domains, each sub domain must have its own robots.txt file. If domainname.com had a robots.txt file but sub.domainname.com did not, the rules that would apply for domainname.com would not apply to sub.domainname.com.

4) The robots.txt file is available to the public to view. Anyone can see what sections of your server you don’t want robots to use.

5) Robots can ignore your /robots.txt.

6) Your robots.txt file should be in the root for your domain. In our server’s configurations this would be the public_html folder in your account. If your domain is “domainname.com” then the bots will look for the file path http://domainname.com/robots.txt.If you have add-on domains and want to use a robots.txt file in those as well you will need to place a robots.txt file in the folder you specified as the root for the add-on domain.

 

Some examples:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /tmp/
Disallow: /private/

 

In the example, web-owner told ALL robots (remember the * means all) to not crawl four directories on the site (cgi-bin, images, tmp, private), if you do not specify files or folders to be excluded it is understood the bot then has permission to crawl those items.

 

To exclude ALL bots from crawling the whole server.
User-agent: *
Disallow: /

 

To allow ALL bots to crawl the whole server.
User-agent: *
Disallow:

 

To exclude A SINGLE bot from crawling the whole server.
User-agent: BadBot
Disallow: /

 

To allow A SINGLE bot to crawl the whole server.
User-agent: Google
Disallow:

User-agent: *
Disallow: /

 

To exclude ALL bots from crawling the ENTIRE server except for one file.
🙂 Tricky since there’s no ‘allow’ directive in the robots.txt file. What you have to do is simply place all the files you do not want to be crawled into one folder, and then leave the file to be crawled above it. So if we placed all the files we didn’t want crawled in the folder called SCT we’d write the robots.txt rule like this.

 

User-agent: *
Disallow: /SCT

 

Or you can do each individual page like this.
User-agent: *
Disallow: /SCT/home.html

 

To create a Crawl Delay for the whole server.
User-agent: *
Crawl-delay: 10

 

If you wish to block one page, you can add a <meta> robots tag.
<meta name=”robots” content=”” />

You can get more knowledge about robots.txt file from http://www.robotstxt.org/

 

Basic difference between GET and POST method in PHP

What are the basic difference between get and post method in php?

 

In form submission with GET method, browser constructs a URL by taking the value of the action attribute appending a? to it, then appending the form data set (encoded using the application/x-www-form-urlencoded). The browser then processes this URL as if following a link (or as if the user had typed the URL directly).

 

Submission of a form with POST method causes a POST request to be sent, using the value of the action attribute and data will treated according to the content type specified by the enctype attribute.

 

Here are the list of some differences between GET and POST methods.

 

1) GET allows only ASCII characters whereas no restrictions in POST, binary data (images and other files) is also allowed.

 

2) History of last data sent remain in browser history using GET but POST method never remain history.

 

3) GET data can be cached but POST never cached.

 

4) It is easier to hack data in GET method with respect to POST.

 

5) In GET a complete URL string with data can be bookmarked, but it can’t happen in POST.

 

6) In GET application/x-www-form-urlencoded encoding is possible but POST can opt multipart/form-data as well.

 

7) GET requests are re-executed. The browser usually alerts the user that data will need to be re-submitted using POST data.

 

8) In GET 2000 character maximum size (depends on browser).8 MB max size for the POST method.

 

9) GET method is visible to everyone (it will be displayed in the browser’s address bar) and has limits on the amount of information to send.
POST method variables are not displayed in the URL.

 

10) GET is less secure compared to POST.

 

11) GET method should not be used when sending passwords or other sensitive information.

 

12) There are restrictions on form data length in GET.

 

Difference between $this and self in PHP

What is the difference between this and self in PHP OOP?

 

PHP classes can have static functions and static variables.Declaring class methods or properties as static make them accessible without needing an instance of the class or an object.

 

Static functions and variables in classes are not associated with any object, but associated with the class definition itself.
You can say all instances of a class share the same static variable and functions.

 

Inside a class definition, $this refers to the current object, while self(not $self) refers to the current class.
self does not use a preceding $ because self does n’t represent a variable but the class construct itself.$this does reference a specific variable so it has a $ prefix.

 

It is necessary to refer to a class element using self & refer to an object element using $this, use $this->var for non-static variables, use self::$var for static variables and same for methods.

 

[php]

class demoClass
{
public $var;
public static $svar;

public function regular_function()
{ echo $this->var; }

public static function static_function()
{ echo self::$svar; }

public static function another_static_fn()
{ self::static_function(); }

public function regular_fn_using_static_var()
{ echo self::$svar; }
}

demoClass::$svar = "Script";

$obj = new demoClass();
$obj->var = "Article";

echo demoClass::static_function();
echo $obj->regular_function();

[/php]

 

Note:

  • static functions can only use static variables. The way static functions and static variables are referenced is self::functionName() or self::variableName.
  • Regular functions and variables of a class need an object to be referenced.

 

PHP cURL functions & example

PHP cURL functions with example

cURL is stand for client URL

It is a library (libcurl) which allows you to connect and communicate to many different types of servers with many different types of protocols.

libcurl supports http, https, ftp, gopher, telnet, dict, file, and ldap protocols. libcurl also supports HTTPS certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, cookies, and user+password authentication ( >PHP 4).

 

Using cURL you can do

  • Implement payment gateway’s scripts (communication between payment gateway and your website script).
  • Login to other websites and access their members only sections (Read mails or get contacts).
  • Check whois/domain availability.
  • Download and upload file from remote server.
PHP cURL wraps up major parts of functionality in just four functions. Simply PHP cURL follows the following steps in sequence.
curl_init – Initializes the session and returns a cURL handle(eg. $ch) which will passed to other cURL functions.
curl_opt– This function is called multiple times and specifies what we want the cURL library to do.
CURLOPT_URL
This is used to specify the URL which you want to process.
CURLOPT_RETURNTRANSFER
Setting this option to 1 will cause the curl_exec function to return the content instead of echoing them.
You can find full list of curl_opt by click here
CURLOPT_FILE
Write the contents to a file as it downloads a web page or file.

curl_exec – Executes a cURL session.
curl_close– Closes the cURL session.

 

Below are some examples.

[php]

<?php

/** reading the content/feed of a website */
/* Initialize the cURL session */

$ch = curl_init();

/* Set the URL of the page or file to download or read content */
curl_setopt($ch, CURLOPT_URL, ‘http://www.scriptarticle.com/feed/’);

/* ask cURL to return the contents in a variable instead of simply echoing them to the browser */
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

/* execute the cURL session */
$content = curl_exec ($ch);

/* Close cURL session */
curl_close($ch);

?>

[/php]

Another example for check/fetch the domain who is information.

[php]

<?php
/** whois-domain availability check */

$domain = "scriptarticle.com";

$data = ‘http://’.$domain;

$ch = curl_init($data);curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

curl_exec($ch);       // Check if any error occurred

if(curl_errno($ch)){

echo ‘The domain is available!’;

}else{

echo ‘The domain is not available’;
}

curl_close($ch);

?>

[/php]

Hope it all make sense !!  post your comment or suggestion below if you need any more assistance.

HTML form enctype Attribute

HTML form enctype Attribute

The HTML form enctype attribute’s main purpose is to indicate how the form data/values should be encoded prior to it being sent to the location defined in the action attribute on the form.

 

for e.g.

<form action=”scriptarticle/post_page.php” method=”post” enctype=”multipart/form-data”>

 

By default, form data is encoded to “application/x-www-form-urlencoded” so that spaces are converted to “+” symbols,and special characters like apostrophes, percentage and other symbols are converted to their ASCII HEX equivalent values.

 

The form’s enctype attribute is supported almost in all browsers.

 

The enctype attribute is supported only if method=”post” is used.

 

Other options of enctype attibute are.

  • application/x-www-form-urlencoded
    Default value.All characters are encoded before sent to server(spaces are converted to “+” symbol and special characters are converted to ASCII HEX values)
  • multipart/form-data
    No characters are encoded.
    This value “multipart/form-data” should be used for submitting forms that contain files, non-ASCII data, and binary data.
  • text/plain
    Spaces are converted to “+” symbol but special characters are not encoded.

 

If you want some more information about HTML form enctype attribute follow W3 Schools link.

Show weather information with Google Weather API & PHP

If you need to show weather in your website. A very simple way to extracting weather information of any location is via Google weather API.

The API will return weather in a very simple XML format that you can easily parse and integrate on any page.

The API need not required any key, you just simply need to pass a city name or postal code (US only), such as Jaipur, Rajasthan

 

http://www.google.com/ig/api?weather=YOURADDRESS

[php]
<?php
$xml = simplexml_load_file(‘http://www.google.com/ig/api?weather=YOURADDRESS’);
$information = $xml->xpath("/xml_api_reply/weather/forecast_information");
$current = $xml->xpath("/xml_api_reply/weather/current_conditions");
$forecast_list = $xml->xpath("/xml_api_reply/weather/forecast_conditions");
?>

[/php]

A Sample Code for Jaipur, Rajasthan

[php]

<?php

$xml = simplexml_load_file(‘http://www.google.com/ig/api?weather=Jaipur,Rajasthan’);

$information = $xml->xpath("/xml_api_reply/weather/forecast_information");

$current = $xml->xpath("/xml_api_reply/weather/current_conditions");

$forecast_list = $xml->xpath("/xml_api_reply/weather/forecast_conditions");
?>

[/php]

HTML page code

[php]

<html>
<head>
    <title>Google Weather API</title>
</head>
<body>
    <h1><?php echo $information[0]->city[‘data’]; ?></h1>

    <h2>Today’s weather</h2>
    <div> <img src="<?php echo ‘http://www.google.com’ . $current[0]->icon[‘data’]?>" />
      <div> <?php echo $current[0]->temp_f[‘data’] ?>&deg; F, <?php echo $current[0]->condition[‘data’]; ?> </div>
    </div>

    <h2>Forecast (Next 4 Days)</h2>
    <?php foreach ($forecast_list as $forecast) : ?>
    <div> <img src="<?php echo ‘http://www.google.com’ . $forecast->icon[‘data’]?>"  />
      <div><?php echo $forecast->day_of_week[‘data’]; ?></div>
      <div> <?php echo $forecast->low[‘data’] ?>&deg; F – <?php echo $forecast->high[‘data’] ?>&deg; F, <?php echo $forecast->condition[‘data’]; ?> </div>
    </div>
    <?php endforeach; ?>
    
</body>
</html>

[/php]

 

Weather parameters in Google’s Weather XML API
1) US or Canadian zip code (http://www.google.com/ig/api?weather=24558)
2) City,state (http://www.google.com/ig/api?weather=New York,US)

 

hl parameter (language parameter)
The default setting,if not defined,is hl=en
You can test it with French,hl=fr (i.e. google.com/ig/api?weather=24558&hl=fr)
The language code will NOT change the XML tags,only change the data in those tags.

 

You can do the temperature degrees Fahrenheit (°F) or Celsius (°C aka centigrade) calculation by the below Formula.

°C  *  9/5 + 32     = °F

(°F  –  32)  * 5/9   = °C