When everyone is trying to get spiders to index their websites and increase their SEO, you may be wondering why anyone would want to block spiders from their websites. But if you really think about it, there are a few reasons why you would want to do this.
1. Keeping your “under construction” pages out of searches
2. Keeping pages with email addresses from being indexed
3. Protecting directories like cgi-bin
4. Keeping your admin panel away from being indexed
To do this you need to implement a “Robots.txt” file. Doing this is not difficult at all. All you need to do is create a text file, fill it with the appropriate code and place it in root of the web server. It’s as simple as that. The file can tell the search engine bots what they can and cannot index. The coding is so easy and versatile that you can block some parts or the whole website off to some or all bots.
The code is very simple and looks like this:
User Agent: [insert name of the bot here]
Disallow: [insert location here]
For example if you wanted Google to avoid indexing the page “confidential.html”, then the syntax of the file would look like this:
If you would want to block it for all search engines, then it would be:
If you want to block off the whole site for some reason or other, you can use:
In addition to above, you may also use the NOINDEX meta tag. To generate this tag, you can use the Submit Express Meta Tags Generator Tool.
There are various ways in which this file can be used. It’s simplicity makes it a very useful and versatile tool. However, you should use it with care. After all, you don’t want to accidentally block off the whole site do you?