Resolving Indexed, though Blocked by robots.txt in Google Search Console

When managing your website’s SEO, you may encounter Indexed, though Blocked by robots.txt in the Indexing report in Google Search Console.

This lets you know that a page is indexed by Google but blocked from being crawled due to rules in your robots.txt file. While this situation is not always problematic, it can lead to incomplete or inaccurate indexing.

In this article, we’ll explain what this means, why it occurs, and how to resolve it using All in One SEO.

In This Article

Understanding the Indexed, though Blocked by robots.txt Status
Identifying the Indexed, though Blocked by robots.txt status in Google Search Console
How to Find the Indexed, Though Blocked By Robots.txt Status using Index Status in All in One SEO’s Search Statistics Feature
How to Fix the Indexed, Though Blocked by robots.txt Status with AIOSEO
Avoiding Similar Issues in the Future

Understanding the Indexed, though Blocked by robots.txt Status

The Indexed, though Blocked by robots.txt status arises when Google successfully indexes a page but can’t crawl its content due to a rule in your robots.txt file. Crawling and indexing are distinct processes:

Crawling is how search engines access and analyze the content on your site.
Indexing involves adding the page to the search engine’s database so it can appear in search results.

When a page is blocked from crawling, Google may index it using only metadata or information from external links, leading to incomplete indexing.

Identifying the Indexed, though Blocked by robots.txt status in Google Search Console

To identify pages with the Indexed, though Blocked by robots.txt status in Google Search Console (GSC), follow these steps:

Log in to your Google Search Console account and select the appropriate property (website) in the Search property drop-down (if you manage multiple websites).
Click on Pages under Indexing in the left-hand sidebar.

In the Page indexing report, scroll down to the Improve page appearance section and look for Indexed, though blocked by robots.txt. Click on this to see a detailed list of all pages flagged for this reason.

After you click on Indexed, though blocked by robots.txt, scroll down to the Examples section to view the list of affected URLs. This will help you understand whether this is intentional or if there are issues, like broken links or incorrect redirect types.

How to Find the Indexed, Though Blocked By Robots.txt Status using Index Status in All in One SEO’s Search Statistics Feature

IMPORTANT:
Before getting started, make sure to connect Search Statistics to your Google Search Console account. You can find instructions on how to connect to your Google Search Console account here.

NOTE:
The Index Status feature is available to customers with an Elite plan for All in One SEO Pro. Upgrade to All in One SEO Pro today to get Search Statistics and many more features!

The Index Status feature enables you to see Google Search Console errors directly within your WordPress dashboard. To do this, follow these steps:

Click on Search Statistics in the All in One SEO menu and then click on the SEO Statistics tab.

In the Content Performance report, you’ll find a column labeled Indexed, which shows the index status of your pages using color-coded icons.

If any of these icons are orange or red, hover over them to reveal a detailed popup. If the issue is a Indexed, though blocked by robots.txt status, then the popup will provide specific information, such as details about the redirect.

Alternatively, navigate to the All Posts or All Pages screen in WordPress. The AIOSEO Details column on this page displays the same index status icons as the Content Performance report. Hovering over an icon here will also show details of any errors.

By using these methods in All in One SEO, you can effectively locate and address URLs affected by the Indexed, though blocked by robots.txt status, ensuring a smooth user experience and optimal indexing for your site. You can learn more about Checking the Index Status of Content in our article here.

How to Fix the Indexed, Though Blocked by robots.txt Status with AIOSEO

AIOSEO provides powerful tools to diagnose and address robots.txt issues. Follow these steps:

Updating Your robots.txt Rules

If the affected page should be indexed and crawled, you’ll need to update your robots.txt file. You can access the Robots.txt Editor by going to the Tools section in the All in One SEO menu of your WordPress dashboard. Here, you can view and edit your robots.txt file directly.

Look for any disallow rules that may be blocking Googlebot from accessing the affected page. For example: Disallow: /example-page/

If the page should be accessible, modify or remove the rule. Save the changes to apply the updates.

Editing Rules Using the Rule Builder

To edit any rule you've added, just change the details in the rule builder and click the Save Changes button.

Deleting a Rule in the Rule Builder

To delete a rule you've added, click the trash icon to the right of the rule.

You can read our article Using the Robots.txt Tool in All in One SEO to learn more.

Test Your robots.txt Block Using the GSC URL Inspection Tool

Now that you've updated your robots.txt file, you can test if Google is still blocked from crawling the page using the Google Search Console URL Inspection Tool.

Log in to Google Search Console and use the search bar at the top to inspect the affected URL.
After the initial inspection, click the TEST LIVE URL button in the top right corner. This shows how Googlebot currently sees the page.
Wait for the live test to complete.
In the live test result, check the Crawl allowed? Status.

If it shows Yes, your robots.txt fix was successful, and Google can now crawl the page.
If it still shows No, expand the Crawl error section and check the Robots.txt blocking details. You may need to revisit the Robots.txt Editor in AIOSEO to make further adjustments.

Applying Noindex Meta Tags for Unwanted Pages

When you want a page to be excluded from search results, it’s essential to use a No Index Robots Meta directive rather than relying on robots.txt rules. The key distinction here is that robots.txt only manages crawling, not indexing. Crawling and indexing are two separate processes.

For example, blocking a page in robots.txt prevents search engines from accessing it, but it doesn’t stop the page from being indexed if it has already been discovered through other means, such as backlinks.

To ensure a page is not indexed, you should add a No Index Robots Meta tag. This tells search engines to exclude the page from search results. However, if you block the same page in robots.txt, search engines like Google won’t be able to crawl it to see the No Index directive. As a result, the page might remain indexed because search engines are unaware of the directive.

For instance, imagine you have a page that you don’t want indexed. If you block it in robots.txt and add a No Index tag, search engines won’t crawl the page to recognize the No Index directive, defeating its purpose.

Instead, you should allow search engines to crawl the page, so they can detect the No Index Robots Meta and drop it from their index.

Example Scenario:

Correct Method: Allow search engines to crawl a URL, such as https://example.com/private-info, and add a No Index tag in the page's header.
Incorrect Method: Block https://example.com/private-info in robots.txt and add a No Index tag simultaneously.

Finally, robots.txt is best used for blocking files like PDFs, images, or feeds, where adding a No Index directive isn’t possible. For example, you might block https://example.com/files/document.pdf in robots.txt because PDFs don’t support Robots Meta tags.

To add a noindex tag to a page, edit the page in WordPress and scroll to the AIOSEO Settings section. Under the Advanced tab, you'll see a setting for Robots Settings with a toggle that's set to Use Default Settings.

Change the toggle to off, and you'll see some checkboxes under the Robots Meta heading.

Check the box for No Index and click the Update button for your post. This post will not be indexed by search engines and will not appear in search results. This process may take a while for Google to de-index your URLs.

Handling External Links to Blocked Pages

If external sites link to a blocked page, Google may still index it using limited data. To resolve this, contact the external site and request they update their link to a more relevant URL. Alternatively, use AIOSEO’s Redirect Manager to create a 301 redirect from the blocked page to a suitable URL. This ensures both users and search engines are directed to the correct content.

NOTE:
The Redirection Manager feature is available to customers with a Pro plan or above for All in One SEO Pro.

Upgrade to All in One SEO Pro today to get Redirection Manager and many more features!

Ask Google to Revalidate Your URLs

After applying these fixes, log in to your Google Search Console and use the URL Inspection Tool to test the affected URL.

Click Request Indexing to notify Google of the changes. This step prompts Google to re-crawl the page and update its status accordingly.

If you’ve resolved all instances of the error, you can ask Google to revalidate your URLs in bulk. On the Page Indexing page in Google Search Console, click the Validate Fix button. This informs Google that the issues have been addressed and the URLs are ready for indexing.

Occasionally, Google might report false positives. In such cases, revalidation ensures these URLs are reviewed again.

NOTE:
If you missed a fix, validation will stop when Google finds a single remaining instance of that issue.

Also, it’s important to remember that, you shouldn’t click Validate fix again until validation has succeeded or failed. You may learn here how Google checks your fixes.

You can monitor the validation progress. Validation typically takes up to about two weeks, but in some cases can take much longer, so please be patient. You will receive a notification via your email when validation succeeds or fails.

Avoiding Similar Issues in the Future

To prevent this error from recurring:

Regularly review your robots.txt file to ensure it aligns with your indexing goals.
Use Noindex Meta tags for pages that should not appear in search results, rather than blocking them in robots.txt.
Monitor your site’s crawling and indexing issues using Google Search Console and AIOSEO’s built-in tools.

The “Indexed, though Blocked by robots.txt” error can be resolved by updating your robots.txt file, adjusting meta directives, or managing external links. By leveraging AIOSEO’s comprehensive tools, you can address these issues effectively and maintain a healthy, optimized website.