MD * Web Design

Robot Exclusion Standard

Here are a few recommendations for using robot exclusion, this may help you protect your files, hide content from spiders, increase ranking.

Introduction

Web Robots (spiders) are programs that traverse pages in the WWW by recursively retrieving linked pages. Occasionally robots visit pages where they are not welcomed for various reasons, parts of WWW servers are not suitable, very deep virtual trees, temporary information, cgi-scripts.

These incidents showed the need for a mechanisms to indicate to robots which parts of a server should be accessed.

Idea

The method used to exclude robots from a server is to create a file specifying an access policy for robots. This file must be accessible via HTTP on the local URL "/robots.txt".

Format

The format and semantics of the "/robots.txt" file are:
  • The file consists of one or more records separated by one or more blank lines (terminated by CR,CR/NL, or NL). Each record contains lines of the form "<field>:<optionalspace><value><optionalspace>". The field name is case insensitive.
  • Comments can be included in file using UNIX, the '#' character is used to indicate that the line up to the line termination is discarded.
  • The record starts with one or more User-agent lines, followed by one or more Disallow lines, as detailed below.
  • User-agent. The value of this field is the name of the robot the record is describing access policy for. If the value is '*', the record describes the default access policy for any robot that has not matched any of the other records. It is not allowed to have multiple such records in the "/robots.txt" file.
  • Disallow. The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved. For example, Disallow: /xxx disallows both /xxx.html and /xxx/index.html, whereas Disallow: /xxx/ would disallow /xxx/index.html but allow /xxx.html. At least one Disallow field needs to be present in a record.


Examples

# robots.txt for http://www.example.com/

User-agent: *
Disallow: /xxx/yyy/
Disallow: /zzz/
Disallow: /qqq.html



It is not enough to design a web site.
Do it right! Learn how.

Tools & Resources
Robots Exclusion
Mistakes in Web Design
SEO Tutorial
HTML Editors
FTP Software
Free Online Resources
© 2003-2008 MAGNETIC-DESIGN.com

home : site map : outsourcing : contact infoWeb Design and Programming
HOME
Web Design & Programming
Search Engine Optimization
Portfolio
This month's offer
Contact