Monday, September 8, 2008

Robots.txt Tips and Tricks

As I mentioned in my last post, here with providing you some indepth points to write a perfect robots.txt file. Having the perfect robots.txt file will help you have your site indexed properly in all search engines. Vice versa, if you don't format your robots.txt file properly, some or all files of your Web site might not get indexed by search engines. The below mentioned points would help you have a proper robots.txt file in your website.
1. Avoid using comments in robots.txt:
Though comments are allowed in robots file, this may tend to confuse some search engines. Better avoid them.

2. Avoid Using white space at the beginning:
Bad Practice:
User-agent: *
Disallow: /support

Best Practice:
User-agent: *Disallow: /support
3. Don't change the order of the commands.
Don’t change the order of the comments if you want your robots.txt file to work cent percent.
Bad Practise:
Disallow: /supportUser-agent: *

Best Practise:
User-agent: *Disallow: /support

4. Always use one directory per line:
Search engine spiders may not understand the format if you mention more than one directory in one single line.
Bad Practise:User-agent: *Disallow: /support /cgi-bin/ /images/

Best Practise:User-agent: *Disallow: /supportDisallow: /cgi-bin/Disallow: /images/

5. Use right case always:
If your directory is Images, then mention as Disallow: /Images/ not as Disallow: /images/

6. Practice to write precise files:
If you want all files under folder “support” don’t write as below,
User-agent: *Disallow: /support/orders.htmlDisallow: /support/technical.htmlDisallow: /support/helpdesk.htmlDisallow: /support/index.htmlInstead, you can replace this withUser-agent: *Disallow: /support

7. There is no "Allow" commandDon't use an "Allow" command in your robots.txt file. Only mention files and directories that you don't want to be indexed. All other files will be indexed automatically if they are linked on your site.

General Tips and tricks:

1. How to allow all search engine spiders to index all files
Use the following content for your robots.txt file if you want to allow all search engine spiders to index all files of your Web site:
User-agent: *Disallow:

2. How to disallow all spiders to index any file
If you don't want search engines to index any file of your Web site, use the following:
User-agent: *Disallow: /

3. Reference URLs:
I have listed some website URLs here for your reference, where you can find some detailed robots.txt files.
http://www.cnn.com/robots.txt
http://www.whitehouse.gov/robots.txt
http://www.nytimes.com/robots.txt
http://www.spiegel.com/robots.txt

Bottomline: Your Web site should have a proper robots.txt file if you want to have good rankings on search engines.

3 comments:

chaitanya said...

Mercy,

well explained
really had good time reading your blog.
Look forward to see more posts.

Chaitnaya.

Mercy said...

@Chaitanya - Thanks! You are real good inspiration for me and making me to write more posts. Thanks for all supportive words and comments.

chaitanya said...

Mercy,
I am thirlled to know this

Post a Comment