When to use NOINDEX or the robots.txt?

One of the questions we are most often asked is what the difference is between the NOINDEX robots meta tag and the robots.txt, and when each should be used. This article addresses this question.

The NOINDEX robots meta tag

The NOINDEX tag is used to prevent content from appearing in search results. The NOINDEX meta tag appears in the source code of your content and it tells a search engine not to include that content in search results.

The NOINDEX robots meta tag looks like this in your page source code:

<meta name="robots" content="noindex" />

The robots.txt file

The robots.txt file tells search engines where their crawlers can and cannot go on a website. It includes “Allow” and “Disallow” directives that guide a search engine as to which directories and files it should or should not crawl. 

However, it does not stop your content from being listed in search results. Also, if the blocked directory or file is linked from any page on your website or on another website, search engines can still crawl them.

An example of how you'd use the robots.txt file is to instruct search engines not to crawl the “/cgi-bin/” directory that may exist on your server, because there's nothing in the directory that is of use to search engines.

The default robots.txt for WordPress looks like this:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

The difference between NOINDEX and robots.txt

The difference between the two is as follows:

  • The robots.txt file is used to guide a search engine as to which directories and files it should crawl. It does not stop content from being indexed and listed in search results.
  • The NOINDEX robots meta tag tells search engines not to include content in search results and, if the content has already been indexed before, then they should drop the content entirely. It does not stop search engines from crawling content.

The biggest difference to understand is that if you want search engines to not include content in search results, then you MUST use the NOINDEX tag and you MUST allow search engines to crawl the content. If search engines CANNOT crawl the content then they CANNOT see the NOINDEX meta tag and therefore CANNOT exclude the content from search results.

So if you want content not to be included in search results, then use NOINDEX. If you want to stop search engines crawling a directory on your server because it contains nothing they need to see, then use “Disallow” directive in your robots.txt file.

You can find documentation on using the NOINDEX feature in All in One SEO in our article on Showing or Hiding Your Content in Search Results here.

You can find documentation on using the Robots.txt feature in All in One SEO in our article on Using the Robots.txt Tool in All in One SEO here.

Further Reading