A Robots.txt file is a text file associated with your website that is used to tell the search engines which of your website's pages you would and would not like them to visit.
The structure of a robots.txt file is very simple. Essentially, it's a note that tells search engines how you want in to index your pages. The most basic robots.txt file looks like this:
"User-agent" refers to search engines. If it is a * it is telling all the search engines something. "Disallow:" is then followed by a page or directory you do not want the search engines to index.
So the above example is telling all the search engines they can access everything.
We automatically set up your robots.txt file to be the following:
You can fully customise your robots.txt file however you wish by following the instructions below:
Then, to upload it and replace the Create one, please follow the steps below:
Your robots.txt will now be changed.
Below are a couple of scenarios you may want for your website and how you can amend your robots.txt file to allow this:
1. Allow all search engines access to images
To specify all search engines you will need to add a "*" symbol as your user-agent, as this represents all search engines:
2. Disallow all search engines access to images
3. Allow only some search engines
If you would like to allow only certain search engines you would need to specify these, as below:
In the example above, all search engines are blocked from crawling your files and images apart from Bing (bingbot) and Google (googlebot).
4. Only allow some search engines access to images
If you want only certain search engines to crawl your images you will need to specify these, as below:
5. Allow your images to be crawled by Google but not appear in Google Images
If you would like Google to crawl your images but for them not to appear in Google Images you will need to specify this by listing the Google Image robot in your robots.txt, as shown below:
By specifically listing this Google Image robot you are stopping your images from appearing in a Google Image search, however by allowing Google to still crawl them this does mean they may still pop up in a Google web search.
6. Disallow some search engines so they cannot crawl anything
If you would like a specific search engine to not crawl your website at all you will need to add a "/" symbol, as this represents all of your content:
In the example above, your robots is not allowing Bing to access your website but all other sites such as Google can!
7. Allow all search engines access to all pages on the site
If you would like to allow all search engines access to everything you will need to add the following to your robots.txt file:
8. Disallow search engines access to some pages on the site
If you would like all search engines to not have access to certain pages you will need to add the page filename to your file as below:
With password protected pages, these pages can be crawled by your robots but it cannot be accessed by a site visitor without a username and password. Due to this, if you did not want this page indexed at all you could add this to your robots.txt file but it is not necessary, as it cannot be accessed by all visitors.
9. Disallow search engines access to private documents