By using a robots.txt file you your site, you can tell
search engine spiders and bots to index or ignore certain
parts of your site. There are other types of bots and spiders,
but most of us just care about the search engine type.
Few important things to note:
- There are bad bots out there that will
disguise themselves as a browser and not follow your robots.txt
rules either. If you want to forbid all possible access
to a certain area, use a .htaccess file or similar.
- Your "robots.txt" file must be put in
your root folder. In other words, the same folder that
your homepage is placed.
Example 1: Allow all robots to access all areas:
#Copy the orange, bold, indented text above. Save as
robots.txt.
#Upload to your the root folder of your website.
Example 2: Deny all robots to access:
#Copy the orange, bold, indented text above. Save as
robots.txt.
#Upload to your the root folder of your website.
Example 3: Allow all robots to access your site except
for the /cgi-bin folder:
#Copy the orange, bold, indented text above. Save as
robots.txt.
#Upload to your the root folder of your website.
You can also specify individual rules for individual bots.
A list of bots can be found by entering "search engine bots"
(without quotes into Google).
Command Breakdown:
User-agent: * = This is telling all incoming robots that
this rule applies to them.
User-agent: Googlebot = This is specifically addressing Google's
Bot
Disallow: = This essentially says, allow everything, since
nothing is specified.
Disallow: / = This means, do not spider at the root level,
or anything deeper.
Disallow: /images/ = This means, do not spider the entire images
folder. |