Category Archives: Web Testing

Search Engine Spider and User Agent Identification with “Ultimate User Agent Blacklist”

search-engine-spider-robotes-bots

Search Engine Spider and User Agent Identification with “Ultimate User Agent Blacklist”

 

A user-agent is software (a software agent) that is acting on behalf of a user.
In many cases, a user-agent acts as a client in a network protocol used in communications within a client–server distributed computing system.

 

For more information view wiki
http://en.wikipedia.org/wiki/User_agent

 

Automated Agents is called as Bots.
http://user-agents.org has a complete list of all latest spiders/bots/user agents.
There are some more and those are anonymous (not known and have very different names).

 

If you develop a website and want to make the site accessible by some specific user agent or bots of a country, then you can update the information in your root .htaccess file.

 

As, Bot blocking blacklists are useless in some way as some rogue spiders just generate random user-agent strings so we will never have them in our list to start with, but We have tried list out as much as we can in the below zip file.

 

Ultimate User Agent Blacklist

 

What you have to done is, unzip the file and paste the code in your root .htaccess file, it will protect your website from unwanted crawling/indexing by anonymous bots.

 

If your website uses WordPress open-source, then can also use “Better WP Security” plug-in.
Just need to download and install the plug-in and go to “Better WP Security – Ban Users” Tab and enable “Enable Default Banned List”, you can also update the list according to your needs.

 

But please be careful before doing same, as it may affect your website’s core files and plug-in, So before doing this backup your website’s files and database.

 

Test and debug your code online

test-and-debug-your-code-online

During program or code development we face lot of issues and bugs and in between we need some debugging.
In PHP there is not a easy way to debug the page line by line, so you have to use ‘echo’,’die’,’print_r’,’var_dump’ like functions for debugging.

 

It take a little bit more time to open the page add these function and then view that page again by refreshing the browser.

 

Here on internet you have lot of websites the providing the same functionality to check the variable in very less time and without reviewing code again and again.

 

I am listing out some of these as follows:

1. json_encode and json_decode parser online tool
http://json.parser.online.fr/

2. base64_decode and base64_encode

3. Array serialize and unserialize

4. Format date and date related function

5. md5,preg_replace ,preg_match

6. String related functions
http://www.tools4noobs.com/online_php_functions/

 

7. Count characters in a string and words in a sentence
http://allworldphone.com/count-words-characters.htm

 

8. JS beautifier (to format your JavaScript file)
http://jsbeautifier.org/

 

9. Format your HTML page (format your HTML page)
http://www.freeformatter.com/html-formatter.html

 

10. W3C validate
http://validator.w3.org/#validate_by_input

 

RUN/Execute your small php code online with any version of PHP.
http://sandbox.onlinephpfunctions.com/

 

RUN/Execute your JavaScript code to view the result online.
http://writecodeonline.com/javascript/

 

RUN/Execute jQuery code online to test
http://jsfiddle.net/

 

List of All PHP Functions
http://php.net/manual/en/indexes.functions.php
http://php.net/quickref.php

 

Okay guys, Try out these hope these will help you in debugging the code.
All the best!!

 

 

Most useful 5 htaccess tricks every webmaster should know

most-useful-5-htaccess-tricks-every-webmaster-should-know

Most useful 5 htaccess tricks every webmaster should know

1) Redirect your website visitors while you update or test your website

order deny,allow
deny from all
allow from 117.117.117.117

ErrorDocument 403 /showpage.html

<Files showpage.html>
allow from all
</Files>

Replace 117.117.117.117 with your IP address. Also replace showpage.html with the name of the page you want visitors to see.

2) Display a custom 404 error page

Your server displays a “404 File Not Found” error page whenever a visitor tries to access a page on your site that doesn’t exist.
You can replace the server’s default error page with one of your own that explains the error and links visitors to your home page.

ErrorDocument 404 /404.html

Replace 404.html with the name of the page you want visitors to see.

3) Handle moved or renamed pages

You’ve moved or renamed a page on your site and you want visitors automatically sent to the new page when they try to access the old one.

Use a 301 redirect

Redirect 301 /oldpage.html http://yourwebsite.com/newpage.html

Using a 301 redirect also ensures the page doesn’t lose its search engine ranking, as you know how important it is.

4) Prevent directory browsing

When there’s no index page in a directory, visitors can look and see what’s inside (directory structure and page listing).
Some servers are configured to prevent directory browsing like this. If yours isn’t, here’s how to set it up:

Options All -Indexes

5) Create user friendly URLs

Which of the two URLs below looks good?

http://yourwebsite.com/about
http://yourwebsite.com/pages/aboutus.html

Shorter URL is always better.

With htaccess and an Apache module called mod_rewrite, you can set up URLs however you want. Your server can show the contents of “/pages/aboutus.html” whenever anyone visits “http://yourwebsite.com/about”. Below are few examples

RewriteEngine on
RewriteRule ^aboutus/$ /pages/aboutus.html [L]
RewriteRule ^features/$ /features.php [L]
RewriteRule ^buy/$ /buy.html [L]
RewriteRule ^contactus/$ /pages/contactus.htm [L]

What is text-based web browser?

Most of you know what is web browser – it’s a program/software by which you surf the internet, can view websites and web pages, send emails etc. 

 

Browsers we are familiar with (Internet Explorer, Firefox, Google Chrome, Apple Safari, Opera, Camilla) have a good interface and are called graphical browsers because they, display all the contents of a web page including text, images, videos and flash.

 

What is a text web browser?
A text web browser is just like a graphical browser, lets you surf the website. One big difference is, the complete absence of a nice looking interface.You have to use keys like the tab and arrows and enter key (instead of mouse) and opening a website shows you only text matter – no images not any other flash media and colours 🙂
It displays only the text on a web site with only links.

 

Importance and use of text web browser
A text web browsers is a very good way to check how a search engine bot views/reads your website page content. These service also available for free on the web, If you don’t want to download and install a text based web browser on your computer.

 

Which one is good in text browser?
Lynx was the first text based web browser and is good till now. It is a free program that can be installed on multiple operating systems (its cross-platform) including Windows, Mac and Unix/Linux. There are several other such programs like Emacs/W3, Edbrowser, ELinks, W3M. Lynx has been around since about 1992 and can be downloaded from Lynx homepage.

 

Google has been the most popular search engine and people always strive hard to have their sites rank high on its search results pages. To help developers with the various aspects of their web sites, Google has posted tons of articles and videos. They’ve also put up a Webmasters Tools section which has many utilities that can assist people in improving their web sites. One of them is the “Fetch as Googlebot” located under “Diagnostics” through which you can check how individual page appear to Google’s crawler.

 

“Fetch as Googlebot” is not a text browser :). However, it’s a great tool that can inform problems/bugs with in your site. Webmaster Tools site is free and you only need a free Google Account to access all of its features.