Tuesday, December 29, 2009

Robots.txt - what is robots.txt













What Is Robots.txt?

Robots.txt is a text file in your site root folder, it tells search robots bots such as googlebot which pages you would like them not to crawl. it is not required that you should have robots.txt on your website, but in some case if needed then you should put robots.txt on your website. robots.txt is like note, and tells search engine bots such as google where to crawl or what webfloder they are allowed to crawl.


remember that robots.txt must be in the main directory (root directory of your website) so that Search engine bots such as google bot can be able to find it. Search Engine Bots do not search the whole site for a file named robots.txt. Instead, they look first in the main directory.

http://www.yourdomain.com/robots.txt

robots.txt is been here since the early stage of the internet. you can visit http://www.robotstxt.org/ to learn more about robot.txt


The structure of a Robots.txt File is just simple.

User-agent: - search engine crawlers name.

Disallow: - lists the files and directories to be excluded from indexing

you can include comment lines, just input the # sign at the beginning of the comment reffer to the example below.

# This is a comment in robot.txt

--------------------SAMPLE CONTENTS OF Robot.txt -----------------------

User-agent: *

Disallow: /temp/

------------------------------------------------------------------------

to avoid serious problems and logical errors. please follow the basic syntax/format of robots.txt , or you can use a validator for robots.txt to validate your robot.txt file and check for errors before uploading it.

1 comment:

  1. robots.txt is very useful. it is wise to have a robots.txt file in your website.

    ReplyDelete