Robots.txt file and its syntax
admin — August 24, 2009 - 19:20
The Robot Exclusion Standard, otherwise known as the Robots Exclusion Protocol or robots.txt protocol, is a method in order to avoid cooperating web spiders and other web robots from accessing all or part of a website which is otherwise openly viewable by everyone.
A robots.txt file is basically a short text file that is present in your home directory.Before search engines spider your site, they search this robot file to see which files/file types and/or directories they are not permitted to see. Following is the syntax of the commands which needs to be placed in the robots.txt file.
The general syntax is
User-agent:*
Disallow:
The above command allows the robot to visit the entire site.
The following command disallows robot from accessing/crawling any of the pages in the site.
User-agent:*
Disallow:/
There are also options to allow the robot from accessing only a few directories:Assume,that you require search engine to access only these directories of the site.
/locations/info1
/locations/info2
/locations/info3
And you don’t need robot to access other directories,then the following will be the syntax:
User-agent:*
Allow:/locations/info1
Allow:/locations/info2
Allow:/locations/info3
Disallow:/
Note:You require a robots.txt file only if your site contains content that you don't want search engines to index. In case you want search engines to index everything in your site, a robots.txt file(not even an empty one) is needed.
Check out the posting in the site webmasterworld.com/robots_txt/3944227.htm.Also check the important comment posted by goodroi.

