Help Centre

How To Use Robots.txt

 

Jump To Section

 

What Is Robots.txt?

A Robots.txt file is a text file associated with your website that is used by the search engines to determine which of your website's pages you want them to visit and which they should not.

How Does The Robots.txt File Work?

The structure of a robots.txt file is very simple. Essentially, it's a note that tells search engines how you want them to index your pages. The most basic robots.txt file looks like the example below which allows any search engine to index everything it can find:

 

User-agent: *
Disallow:

 

This direction is broken down into two parts, the first is the User-agent. Theseare(for the most part) search engines that are crawling your site. You can structure your robots.txt file to apply rules to specific search engines. For example, you could use the following rule to refer to Bing robots:

 

User-agent: bingbot
Disallow: 

 

In most cases, User-agent is followed by a * which represents that the rules apply to all robots.

The second part is the Disallow: function which specifies a page or directory you do not want the search engines to index. So the above example is telling Bing that they can access everything as no command has been specified.

 

How Is The Robots.txt File Set Up On My Website? 

We automatically set up your robots.txt file to be the following:

user-agent: *

Sitemap: http://yourdomain.co.uk/sitemap.xml

disallow: /include/

disallow: /shop/basket_new.php

disallow: /shop/checkout_process.php

disallow: /account/

Disallow: /cdn-cgi/

disallow: /websiteusers.html


Where Is the Robots.txt File?

To access your robots.txt file as the search engines will, enter your full domain name into the address bar of your browser and add "/robots.txt" to the end of your website's address.

For example - https://www.yourdomain.co.uk/robots.txt


How To Customise The Robots.txt File

You can fully customise your robots.txt file by following the instructions below:

  1. On your computer, open NotePad (or TextEdit on a Mac)

  2. Use this program to write your new robots file in plain text without styling or formatting

  3. Save the file as the name: robots.txt

Examples

Below are several scenarios you may want for your website and how you can set your robots.txt file to allow this:

1. Allow all search engines access to images

To specify all search engines you will need to add a * symbol as your user-agent, as this represents all search engines:

User-agent: *

Allow: /siteimages/


2. Disallow all search engines access to images

User-agent: *

Disallow: /siteimages/


3. Allow only some search engines

If you would like to allow only certain search engines you would need to specify these, as below:

User-agent: *

Disallow: /sitefiles/

Disallow: /siteimages/

User-agent: googlebot

Disallow:

User-agent: bingbot

Disallow:

In the example above, all search engines are blocked from crawling your files and images apart from Bing (bingbot) and Google (googlebot).


4. Only allow some search engines access to images

If you want only certain search engines to crawl your images you will need to specify these, as below:

User-agent: *

Disallow: /siteimages/

User-agent: googlebot-image

Disallow:


5. Allow your images to be crawled by Google but not appear in Google Images

If you would like Google to crawl your images but for them not to appear in Google Images you will need to specify this by listing the Google Image robot in your robots.txt, as shown below:

User-agent: *

Disallow: /sitefiles/

User-agent: googlebot-image

Disallow: /siteimages/

By specifically listing this Google Image robot you are stopping your images from appearing in a Google Image search, however by allowing Google to continue to crawl them this does mean they may still pop up in a Google web search.


6. Disallow some search engines so they cannot crawl anything

If you would like a specific search engine to not crawl your website at all you will need to add a / symbol, as this represents all of your content:

User-agent: *

Disallow: /sitefiles/

Disallow: /siteimages/

User-agent: bingbot

Disallow: /

In the example above, your robots is not allowing Bing to access your website but all other sites such as Google can!


7. Allow all search engines access to all pages on the site

If you would like to allow all search engines access to everything, you will need to add the following to your robots.txt file:

User-agent: *

Disallow:


8. Disallow search engines access to some pages on the site

If you would like all search engines to not have access to certain pages you will need to add the page filename to your file as below:

User-agent: *

Disallow: /guestbook/

Disallow: /onlineshop/

With password protected pages, these pages can be crawled by your robots but it cannot be accessed by a site visitor without a username and password. Due to this and the fact that the page is likely to have very little SEO benefit, if you did not want this page indexed at all you could add this to your robots.txt file but it is not necessary, as it cannot be accessed by all visitors.


9. Disallow search engines access to private documents

User-agent: *

Disallow: /sitefiles/27/3/6/273678/contact_form.pdf


Please bear in mind that if you want to disallow any pages or private documents from being indexed on your website using robots.txt, that people can still find these through your robots.txt file if they looked it up. If you do want to restrict access to these resources on your website, we would recommend password protecting your page

 

How To Upload Your Own Robots.txt File

To upload your robots.txt file and replace the Create one, please follow the steps below:

  1. Login to your Create account

  2. Click on Content on the top menu

  3. Click on Files on the left hand menu

  4. Click on the green button Add File in the top right-hand corner

  5. Click on the Upload button and choose your file.

  6. Click the green button Upload The File

  7. Publish your website for the change to take effect

Your robots.txt will now be changed. 

More Questions?

If you have any further questions, please get in touch and we will be happy to help.

Get in Touch