Pay attention to the responses your server gives to Googlebot, especially a high number of 500 responses, fetch errors, timeouts, DNS problems, and other issues.

You might have been wondering how Google Search interacts with your website—a process generally referred to as crawling. Let’s dive into troubleshooting pages that are not getting into Google Search from this perspective. If you’ve read our “How Search Works” article series or gone through our documentation on this topic, you already know that the first stage of getting your pages into Google Search is crawling. But if pages aren’t getting into Search, how can you troubleshoot this issue, starting at the crawling stage?

Here’s my first tip. It’s relatively well-known but still often forgotten just because you can access a page in your browser doesn’t mean that Googlebot can access it. This can happen for several reasons. For instance, the robots.txt file might prevent a crawler from accessing a URL, or there might be a firewall or bot protection blocking Googlebot. There could also be networking or routing issues between Google’s data centers and your web server, among other issues. So, simply opening the URL in a browser isn’t a good test. Use the URL Inspection Tool in Google Search Console instead or use the Rich Results Test to see if Googlebot can access a page. These tools show the rendered HTML of the page. When you search for content in the rendered HTML and can find it, then crawling is not the issue. Otherwise, something went wrong.

Tip number two is to use the Crawl Stats Report, specifically the response section within the report, to see how your server responds to crawl requests. Pay attention to the responses your server gives to Googlebot, especially a high number of 500 responses, fetch errors, timeouts, DNS problems, and similar issues. These errors may sometimes happen transiently, resolving themselves without intervention. However, if they occur frequently or spike unexpectedly, you may want to investigate further. If your site is particularly large—more than a million pages—errors in the 500 range might also slow down crawling. When you spot such errors (e.g., 500 errors or fetch errors), check some sample URLs to see if they still produce these errors using the URL Inspection Tool in a live test. If Googlebot can now reach these URLs, no further action is necessary. If the problem persists, you can use the tool to gather more insights and troubleshoot further.

The final step in addressing crawling issues is an advanced one, which might require assistance from your hosting company or development team. Examining your web server logs is a powerful way to understand what’s happening on your server. These logs reveal patterns, the volume and timing of requests, and how your web server responded. Be mindful, though, that not everyone claiming to be Googlebot actually is Googlebot. Some requests might come from third-party scrapers pretending to be Googlebot.

To sum up:

  1. Check the URL Inspection Tool and the Crawl Stats Report to identify and address crawling issues.
  2. Analyze your web server logs to understand how your server responded to requests.
  3. Be cautious of fake Googlebot’s.

Leave a comment if you want more technical content on Google Search Central or suggestions for future topics. See you soon!

Author

bangaree

Leave a comment

Your email address will not be published. Required fields are marked *