Monday, 1 April 2013

Using Robots.txt in your Website, Wordpress, Blog etc

One simple thing you can do help to website crawlers crawl your site is have a robots.txt file. This is especially important if you have sections of your website you do not want indexed by search engines.
In addition your robots.txt file can store the location of your site’s sitemap, making it easy for crawlers to find and crawl every page on your site.
“The robots exclusion standard, also known as the Robots Exclusion Protocol or robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is, otherwise, publicly viewable.” – WikiPedia
robots.txt files are quite simple and easy to create.
An example robots.txt file:
User-agent: *Disallow: /tmp/Disallow: /private/Sitemap:
The first line file “User-agent: *” tells crawlers that any crawler can crawl the site.
The next 2 lines tell crawlers not to crawl anything in the tmp and private folders.
The last line tells the crawler where to find the site’s sitemap.
A robots.txt file is always found in the top level directory on your domain.
I used to create my own robots.txt file but I recently found a site that will generate one for me for free:
Read more about robots.txt:
