Back to FAQ Index

What is robots.txt?

What is robots.txt?

robots.txt, found in the root directory of a website, is a specially formatted file that allows a web site administrator to indicate which parts of the site should not be visited by a robot.  It must be specified in the Robots Exclusion Protocol.  The file must be named robots.txt in lower case.

You can use robots.txt to help legitimate robots (e.g. Googlebot) find what you want them to find, avoid traps that could cause them to be banned by a tool like NukeSentinel, and most importantly, conserve your limited bandwidth.  Unfortunately, bad robots (e.g. spambots, linkbots) often ignore the robots.txt instructions.  Fortunately, there are ways to stop them.

from The Web Robots Pages:
The Robots Exclusion Protocol is a method that allows Web site administrators to indicate to visiting robots which parts of their site should not be visited by the robot.

In a nutshell, when a Robot vists a Web site, say http://www.foobar.com/, it firsts checks for http://www.foobar.com/robots.txt. If it can find this document, it will analyse its contents for records like:

User-agent: *
Disallow: /
to see if it is allowed to retrieve the document. The precise details on how these rules can be specified, and what they mean, can be found in:

This Q&A was found on: http://nukeseo.com/modules.php?name=FAQ&file=index&myfaq=yes&id_cat=1