Category Archives: Networking

Search Engine Spider and User Agent Identification with “Ultimate User Agent Blacklist”

Search Engine Spider and User Agent Identification with “Ultimate User Agent Blacklist”

 

A user-agent is software (a software agent) that is acting on behalf of a user.
In many cases, a user-agent acts as a client in a network protocol used in communications within a client–server distributed computing system.

 

For more information view wiki
http://en.wikipedia.org/wiki/User_agent

 

Automated Agents is called as Bots.
http://user-agents.org has a complete list of all latest spiders/bots/user agents.
There are some more and those are anonymous (not known and have very different names).

 

If you develop a website and want to make the site accessible by some specific user agent or bots of a country, then you can update the information in your root .htaccess file.

 

As, Bot blocking blacklists are useless in some way as some rogue spiders just generate random user-agent strings so we will never have them in our list to start with, but We have tried list out as much as we can in the below zip file.

 

Ultimate User Agent Blacklist

 

What you have to done is, unzip the file and paste the code in your root .htaccess file, it will protect your website from unwanted crawling/indexing by anonymous bots.

 

If your website uses WordPress open-source, then can also use “Better WP Security” plug-in.
Just need to download and install the plug-in and go to “Better WP Security – Ban Users” Tab and enable “Enable Default Banned List”, you can also update the list according to your needs.

 

But please be careful before doing same, as it may affect your website’s core files and plug-in, So before doing this backup your website’s files and database.

 

Use of Environment variables

Environment variables windows

 

Environment variables are a set of dynamic named values that can affect the way running processes will behave on a computer.

 

You can say, It is a dynamic “object” that stores a value, which in turn can be referenced by one or more software programs in Windows (OS). Environment variables help programs know what directory to install files in, where to store temporary files, where to find user profile settings, and many other things.

 

Variable names are NOT case sensitive in Windows OS.

 

Environment variables are dynamic because they can change. The values stored can be changed to match the current system’s setup and design (environment). They can also differ between computer systems because each computer can have a different setup.

 

There are a number of environment variables that get referenced by programs and can come in handy for a computer user to find needed information about their computer environment.

Below is the list of some common and important environment variables.

 

%appdata%
%commonprogramfiles%
%local%
%localappdata%
%programfiles%
%temp%
%userprofile%
%windir%

 

You can get some common information using environment variable quickly.

 

You can quickly access any of the above folders by entering the environment variable in the Windows Run box or Windows Search Box.
e.g: To get into the Application Data folder type %appdata% and then press Enter in the Run box.

 

%appdata%
The %appdata% environment variable contains the directory path to the Application Data folder for your user profile. This folder stores settings and logs, among other things, for various software programs. The settings and logs stored there are specific to your user profile.

 

%commonprogramfiles%
The %commonprogramfiles% environment variable contains the directory path to the Common Files folder, within the main Program Files folder. This folder contains various files for common programs and utilities on a computer, mostly system and services related. The default directory path this variable points to is c:\Program Files\Common Files.

 

%local%
The %local% environment variable points to where the security policies & rules are located for the user’s account, Windows in general, Windows Firewall, Network, and various software programs on the computer. This environment variable is native to Windows 7.

 

%localappdata%
The %localappdata% environment variable contains the directory path to where programs store their temporary files. Common temporary files to be stored here are Desktop Themes, Windows Error Reporting, program caching and Internet browser profiles. This environment variable is native to Windows Vista & Windows 7.

 

%programfiles%
The %programfiles% environment variable contains the directory path to where programs are installed. This directory contains sub-directories for each program, which contain the primary files needed by each program in order to run on a computer. The default directory path this variable points to is c:\Program Files.

 

%temp%
The %temp% environment variable contains the directory path to where temporary files stored. These temp files are often Internet temporary files and other user application temporary files (Microsoft Word, Excel, Outlook, etc.).

 

%userprofile%
The %userprofile% environment variable points to the current logged in user’s profile and the directory where user profile data is stored. It is in this directory that a user can find the following folders: My Documents, My Music, My Pictures, Desktop, and Favorites (Internet Explorer bookmarks).

 

%windir%
The %windir% environment variable points to the Windows directory, where Windows system files are located.The default directory path for most versions of Windows is c:\Windows (for Windows NT 4 and 2000, it is c:\WinNT).

 

Automatically set permission to various file types using .htaccess

Set file permissions with .htaccess, this is a great method for ensuring the CHMOD settings for various file types.

 

Apply the following rules in the root .htaccess file to affect all specified file types, or place in a specific directory to affect only those files (add/update file types according to your needs)

# ensure CHMOD settings for specified file types
# never set CHMOD 777 unless you know what you are doing?
# files requiring write access should use CHMOD 766 rather than 777
# keep specific file types private by setting their CHMOD to 400

chmod .htpasswd files 640
chmod .htaccess files 644
chmod php files 600

 

Require SSL (Secure Sockets Layer)

Here is an easy way you can go using .htaccess file

# require SSL without mod_ssl
RewriteCond %{HTTPS} !=on [NC]
RewriteRule ^.*$ https://%{SERVER_NAME}%{REQUEST_URI} [R,L]

 

Guys, If you still face any issue regarding this, leave a comment in the comment box.

 

Integrate Twitter Widget to add to your website

The twitter updates are always short – under 140 characters each. Plus, one can post updates and follow her friends using the Twitter website, software on his browser, a mobile phone or instant messages. People aren’t tied to one device. This makes the twitter so popular.

 

It has grown far beyond its microblogging roots to become a fabulous tool for social messaging.

Many people use Twitter clients to update their status and read tweets, but there are also a host of useful Twitter widgets that allow you to check your tweets from your blog/website or even let people re-tweet your blog entries.

 

Display your Twitter updates on your website like this.

twitter-widget

This is one of Twitter widgets that will let you take your status updates and put them up anywhere that allows custom widgets. The great thing about the Twitter Profile Widget is that you can put your tweets on a loop.

 

Twitter provide very rich widgets, where you can easily manage it’s width, height, background colour, text colour, links, number of tweets, loop scroll etc.

 

You can produce the code by follow the below link
https://twitter.com/about/resources/widgets/widget_profile

 

Googlebot & Site Crawl

Advice Googlebot (Google) To Crawl Your Site

 

Googlebot is a Google’s web crawling bot or spider. This collects data from the web pages to build a searchable index for the Google search engine. Crawling is simply a process by which Googlebot visits new and updated pages, It uses an algorithmic programs determine which sites to crawl, how often, and how many pages to fetch from each site?

 

As Googlebot visits website it detects links (src and href) on each page and adds them to its list of pages to crawl. New sites, changes to existing sites, and dead links are noted and used to update the Google index.

 

If a webmaster wishes to control the information on their site available to a Googlebot,they can do so with the appropriate directives in a robots.txt file, or by adding the meta tag

 

<meta name=”Googlebot” content=”nofollow” />

 

to the web page.

 

Once you’ve created your robots.txt file, there may be a small delay before Googlebot discovers your changes.

 

Googlebot discovers pages by visiting all of the links on every page it finds. It then follows these links to other web pages. New web pages must be linked to other known pages on the web in order to be crawled and indexed or manually submitted by the webmaster.

 

Setting up Virtual Hosts with XAMPP running on Windows 7 & Windows XP

Virtual Hosts simply give you the ability to “host” more than one Website and domain on your computer.

 

With a virtual host you can have separate local domain names for each of your Websites.for example, http://website1/ for one site and http://website2/ for another etc. When you type the URL for the Virtual Host in your Web browser, the browser doesn’t go out onto the internet to find the site, but instead asks for the proper file from the Webserver running on your computer. Virtual Host not only let you run multiple Web sites on your computer, but it also let you store the files for those sites anywhere on your computer and not just in the D:\XAMPP\htdocs folder.

 

Adding virtual hosts with xampp in windows system is 2-step process.

 

1. Add a new entry to your computer’s hosts file.
Open the hosts file located at D:\WINDOWS\system32\drivers\etc.

At the end of that file, you will get
127.0.0.1  localhost
127.0.0.1 is how a computer refers to itself—it’s an IP address of computer. The second part “localhost” is the “domain” of the virtual host.

Add new line like
192.168.0.06  webaddress

Save and close the hosts file.

 

2. In Notepad open the Apache configuration file located at D:\xampp\apache\conf\extra\httpd-vhosts.conf

At the bottom of that file add

NameVirtualHost *
<VirtualHost *>
DocumentRoot "D:\xampp\htdocs"
ServerName localhost
</VirtualHost>


<VirtualHost *>
DocumentRoot "D:\Documents and Settings\Me\My Documents\clientA\website"
ServerName webaddress
<Directory "D:\Documents and Settings\Me\My Documents\clientA\website">
Order allow,deny
Allow from all
</Directory>
</VirtualHost>

 

The first five lines of code turn on the Virtual Host feature on Apache, and set up the D:\xampp\htdocs folder as the default location for http://localhost. That’s important since you need to be able to access the XAMPP web pages at http://localhost/ so that you can use PHPMyAdmin.

 

You can add more virtual host according to your need.
The first item DocumentRoot indicates where the files for this site are located on your computer.
The second part ServerName is the name you provided in step 2, the virtual host name.
The third item the <Directory> part  is the same path you provided for the DocumentRoot.
This is required to let your Web browser have clearance to access these files.

 

Save and close the Apache configuration file, and restart Apache from the XAMPP control panel or from Windows Control Panel => Administrative Tools => Services.

Open web browser and type a URL for the virtual host. For example: http://webaddress/
and you will get homepage of your website.

 

For more information you can go the below link.
http://httpd.apache.org/docs/2.2/vhosts/