Generally speaking, most website owners welcome search engine indexing. Indexing is when a search engine parses, collects, and stores data for information retrieval to be efficient and, more importantly, accurate. Experienced site owners and bloggers, however, know that not all parts on their website serve a purpose for search engines.
How come? You might ask. Well, the reason is simple; not all pages or posts are meant for site visitors or guests to land at when they search. Case in point — shopping cart pages, payment processing pages, or newsletter confirmations, email verification prompts, etc. Some pages obviously do not need to be indexed as they are pages that relate to action.
Another example is when a page is still in the process of being developed. No one wants their incomplete page showing up on searches. When these situations come up, your best bet is to ensure your WordPress post is not indexed.
So, how does one go about ensuring that search engines are discouraged from indexing an unfinished site or a page? Since it is necessary now and then to stop search engines from crawling all over your WordPress site, users need to implement no index and nofollow. Checkout these compiled solutions
Method 1: Implementing Meta tags
Blocking URLs with the assistance of robot meta tags is Google’s suggested method to site owners. To start orbit meta tags follow this simple format:
<meta name=“value” content=“value”>
This meta tag for robots is to be placed within the WordPress theme header section, meaning to say it should be between <head> and </head>. There are in existence several varying available values and content attributes that Google suggests using to block the access of search engines for noindex.
<meta name=“robots” content=“noindex”>
To be able to disallow search engine spiders from indexing a specific page correctly, it is vital to aptly replace the robot’s value with the search engine crawler’s name. Here is a list of the most common spider names used
- bingbot – Bing
- googlebot-news – Google News
- googlebot – Google
- googlebot-image – Google Images
- teoma – Ask
So if you are targeting a specific spider, this is how you replace the name correctly.
<meta name=“teoma” content=“noindex”>
Or, if you need to include more spiders, separate with a comma, as seen in this example.
<meta name=”teoma,bingbot” content=”noindex”>
Method 2: Using Plugins
Aside from using meta tags, another solution to employ is to block engines on WordPress with the use of a plugin. One plugin example that has proven to be useful for this instance is PC Hide Pages.
What the plugin does is to apply the appropriate meta tag to the specific page you wish to hide. This is so far one of the most efficient methods as it shows at a glance which pages have been hidden. Furthermore, all of this can be accomplished through the admin section of WordPress.
Perhaps the biggest downside of this installation that it mainly supports pages on WordPress and not custom post types of blogs.
Another plugin supported by many WordPress users is Yoast. What does Yoast do? It provides several alternatives for setting a URL or several URLs to noindex.
- With Yoast SEO running, select the meta box. The “advanced” options tab follows it.
- The question “Allow search engines to show this post in search results?” should be visible. Selecting “YES” will index your post on search engines, while selecting “NO” sets the post to noindex.
*Note that posts that are already visible in search engines before setting to noindex will take some time to disappear. Since they already exist in indexes, search engines need to re-index to scan the noindex tag that is newly in place.
Method 3: Coding with Robots.text
Similar to the meta tag protocol, Robots. The text follows a couple of basic rules.
User-agent — the rule is to be applied to this search engine crawler and also used to denote spider names, as previously mentioned in the meta tags portion of this article.
Disallow — the directory or URL you wish to block from search engines.
Example:
User-agent: teoma
Disallow: /checkout-confimred/
or
User-agent: *
Disallow: /checkout-confimred/
Many website owners use the * as a “wildcard” that blocks all search engines. This saves users the trouble of having to type in several search engine spider names. It is also vital to know that with the disallow rule, the directory or URL being blocked is defined by utilizing a corresponding path from your domain.
Furthermore, the rules specified in robots. The text file is very case sensitive. When utilizing the codes, be conscious of this fact; otherwise, things could quickly go awry. That said, if you wish to block search engines from indexing an ebook with the file name How_to_Noindex_eBook. Epub but end up using lowercase letters such as /downloads/how_to_noindex_ebook.epub in your robot.text will result in the code not functioning correctly.
Method 4: Public View Removal
In some instances, it is not enough to prohibit the indexing of a page. For instance, there are certain types of content that webpage owners would prefer to have hidden. An example would be restricted content, reserved only for premium site members. A practical approach to this is to use a plugin known as Paid Memberships Pro.
This nifty little installation permits site masters to limit content access only to those who are deemed eligible. This solution is perfect for premium content, as well as downloads. The plugin, once installed, offers users a simple to follow set of instructions to get you on your way to protected content.
Final Thoughts
Unfortunately, the internet is filled with a variety of loopholes. Surprise, surprise! Search engines do not always play by their rules. There are still several instances when search engines choose to ignore set rules or a request not to index a particular page. This is often the case with spammers, hackers, low-level search engine spiders, and malicious software. For the most part, search engines that are widely established are more likely to honor these rules and codes.