This last week I have been getting a lot of notifications from Google Search Console about a drastic increase in soft 404 errors. Usually, whenever you get a notification directly from Google there is something you need to fix. Generally you should never ignore these. Check out one way below on how to fix soft 404 errors in WordPress.
What are Soft 404 Errors?
You are probably all familiar with standard 404 errors, which means a page doesn’t exist. A soft 404 error occurs when a non-existent page displays a page not found message to anyone trying to access it, but fails to return a HTTP 404 status code. They can also occur when the non-existent page redirects users to an irrelevant page, such as the homepage, instead of returning a HTTP 404 status code. The important thing to remember here is that the content of a web page is entirely unrelated to the HTTP response returned by the server.
Google has a great analogy to explain soft 404 errors:
Just because a page displays a 404 File Not Found message doesn’t mean that it’s a 404 page. It’s like a giraffe wearing a name tag that says “dog.” Just because it says it’s a dog, doesn’t mean it’s actually a dog. Similarly, just because a page says 404, doesn’t mean it’s returning a 404.
Hallam Internet also has an in-depth article on the affect of soft 404 errors and rankings, which I recommend reading.
How to Fix Soft 404 Errors in WordPress
Below is the soft 404 warning I got in Google Search Console.
A bunch of them were being linked from a weird domain: 88Q82019309.com.
The first thing you will want to do is click into “Crawl Errors” in Google Search Console, and then into the “Soft 404” tab. As you can see below I just started getting a bunch of soft 404 errors all of the sudden.
Click into one of the errors. You can see in my case they are coming from the “search” functionality on my WordPress site. Most likely this is due to a spammer of some sort. They are simply running query strings rapidly through it and then it generates soft 404 errors because obviously those pages don’t exist.
One way to prevent this is to simply disable the WordPress search URL from being crawled. This means modifying the robots.txt file on my WordPress site. The robots.txt file allows you to control how Google accesses and crawls your site. Typically this is located at the root of your site. You will need to download it via FTP, edit it, and re-upload it.
This was how my default robots.txt file looked:
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
After editing it, it now looks like this:
User-agent: * Disallow: /wp-admin/ Disallow: /?s= Disallow: /search/ Allow: /wp-admin/admin-ajax.php
I am adding the
Disallow: /?s= and
Disallow: /search/ which will prevent Google from generating such pages. Be very careful when manipulating your robots.txt file as you could harm your indexing if you don’t do it correctly.
After editing and re-uploading your robots.txt file you should clear out the warnings in Google Search Console. You can then wait a few days and ensure that no more come back. It is also important to note that this might not always fix them.
One thing you need to keep in mind is that disallows in the robots.txt will just disallow crawling, it will have less of an impact on actual indexing. If we have reason to believe that a URL which is disallowed from crawling is relevant, we may include it in our search results with whatever information we may have (if we’ve never crawled it, we may just include the URL — if we’ve crawled it in the past, we may include that information).
To prevent URLs like these from being indexed, I would recommend that you have the server 301 redirect to the appropriate canonical (and of course not link to the incorrect one). – John Mueller, Web Master Trends Analyst and Google (src: Builtvisible)
There are other scenarios that will cause soft 404 errors to show up in Google Search Console. But the example above shows you one way to combat them. If this tutorial on how to fix soft 404 errors in WordPress was helpful, let me know below.