How to Manually Create the Best Robots.txt File for SEO

Posted on May 8 2017 - 6:30pm by Nigel Quadros

Profile Photo (11) - Nigel QuadrosIt is a known fact that everyone loves “hacks”.

I am not an exception and I personally love searching ways to make my life easier and better on a bigger level.

That is the reason why the technique I am going to tell you today about is by far one of my long time favorite. It is a pure SEO hack/tweak that you can start using as soon as possible.

With this technique, you will be able to increase your website/blog SEO by taking a strong advantage of a basic part of every website that people hardly talk about. Trust me, it is not at all a pains to implement this.

You guessed right ! It is the robots file which is a simple .txt file.

This small text file is part of most of the websites on the Internet, but many people have not heard about it and those who know about it have not implemented it.

It is designed to work with all those search engines out there, but shockingly, it is a source of amazing SEO juice that is waiting to be unleashed.

I have come across so many clients who try so many ways to enhance the SEO of their personal or corporate websites. I often recommend to them that they can see wonders happen if they only edit a tiny text file and they don’t believe me.

There are several hundred methods of super charging SEO that are not so time consuming and a walk in the park, for sure.

As per me, you need not have any technical experience to see the power of the robots.txt file. In simple words, if you are able to find the main code of your website, you can do this with one eye closed.

As soon as you are ready, follow me, and I will show you how to change your robots.txt file so that all search engines on the web get attracted to the beautiful body. (Pssst !…body as in robots.txt.)

Why the robots.txt file is important for your website

Firstly, let us have a look at why the robots.txt file creates such an impact in reality on the Internet.

The robots.txt file which is more technically known as the robots exclusion protocol is a text file that instructs web robots which pages you want to be crawled on your website. Apart from just positive commands, it also instructs the crawlers as to which pages not to crawl.

For example, a search engine is going to visit a site. Just before it visits the target page, it will 100% check the robots.txt for those instructions.

Furthermore, there are different types of robots.txt files, so let us look at some examples along with what they look like.

Leading with another example, a search engine locates this example robots.txt file:

The basic skeleton of a robots.txt file looks like this.

If you take a closer look, the asterisk after “User-agent” means that the robots.txt file automatically applies to all the web robots that visit the site.

Whereas the slash after “Disallow” commands the robot not visit any pages on the website.

Now you might be wondering why anyone would would not wanted their website visited by web robots.

In the end of everything, the most important goal of SEO is to get maximum number of search engines to crawl your website easily so that the ranking is increased and your Domain becomes more Trustworthy.

Now, this is where the hack of SEO comes in….

I am pretty sure that you have a lot of pages on your site, right? Even if you do not think you have many, it is worth to check and you might be surprised.

What is happening now is that if a search engine crawls your website, it will eventually crawl each and every page of your site.

If you have a lot of pages on your site, it will take the spider/bot a while to crawl everything which could have a bad effect on the ranking.

This is because the Googlebot has a specific “Crawl Budget.”

From here, there is a break into two parts. The first is the crawl rate limit. This is how Google explains it;

The next most important part is “Crawl Demand”;

In simple words, crawl budget is “the number of URLs a Googlebot wants to crawl and actually can.”

The face is to help the Googlebot spend its crawl budget in the best way possible for your site. I recommend the Googlebot to crawl only your most valuable pages. That’s right !

There are some notable factors as per Google which will “negatively affect a site’s crawling and indexing.”

Those factors are here;

Let’s get back to the robots.txt now.

Once you create the right robots.txt page, you will be able to tell all search engine bots to avoid certain pages (pages that you want to exclude).

As per Google,

“You don’t want your server to be overwhelmed by Google’s crawler or to waste crawl budget crawling unimportant or similar pages on your site.”

If you use robots.txt the proper way, you can tell search engine bots to spend their budgets for crawling wisely. And that is what makes the robots.txt file when it comes to SEO, so important.

Are you surprised by the power of robots.txt?

You definitely should be! Let us talk about how to find and use it properly.

Locating your robots.txt file

In case you want to have a quick look at your site’s robots.txt file, there is a super easy way to achieve it.

Too good to be true, this method will work for any website. So you can play smart in terms of learning and peek on other sites’ files to see what they are doing.

Type the basic URL of the site into your browser’s address bar (e.g., androguru.com, moradiadosquadros.com, etc.). Thereafter add /robots.txt at the end and hit Enter. Voila !

Now you will be able to see one of these three situations happening :

1) You will find a robots.txt file which looks like this;

sitemap file

2) You will find an empty/blank file.

Leading with an example, Disney does not have a robots.txt file:

3) You will get a 404 (…usually known as an error).

For example, Method gives back a 404 for robots.txt:

Take a deep breath for a second and view your own site’s robots.txt file now.

In case you find an empty file or a 404 like Situation 2 or 3 above, you will have to fix that quickly.

If you do find a a right file, it is most likely set to the default settings which were created when initially deployed your site.

Personally, I like looking at other sites’ robots.txt files which proves to be a valuable exercise to learn the advantages and disadvantages.

We are almost there ! Let us have a look at how to find and change the robots.txt file.

Locating your robots.txt file

In case your website does not have a robots.txt file, you will need to create it from scratch and it is very easy. Simply open a plain text editor like TextEdit (Mac.) or Notepad (Windows). Whether your website has a robots.txt file or no will have to be checked first.

Please use only a plain text editor for this. I recommend one of my favorite which is Editpad.

You need to locate the robots.txt file in your site’s root directory. You can either check by heading over to http://www.yoursite.com/robots.txt and if it is blank means there is no file if you are using WordPress. If you are using a custom designed website, be sure to check using an FTP Manager to find the file and edit/delete it.

Creating a normal robots.txt file

A new robots.txt file can be easily created by using the plain text editor as per your liking. (Only use a plain text editor like Notepad, Sublime Text, etc)

In case you already have a robots.txt file, you have to make sure you have deleted the text (but not the entire file).

First, you will need to get used to some of the syntax used while creating the file.

Google has a pretty good explanation of some basic robots.txt terminology:

Now, I am going to demonstrate to you as how to set up a simple robot.txt file.

Simply begin by setting the user-agent term. I am going to set it in such a way through which it applies to all web robots.

Moving ahead, type “Disallow:” but do not type anything after that so leave it blank.

Since there is nothing after the disallow, all web robots will be instructed to crawl the full site. As of now, everything on your site is available.

So far so good right ? Your robots.txt file should resemble this.

Trust me, these two lines are kind of a big deal and are going to do wonders .

I would also recommend listing your XML sitemap, although it is not mandatory. In case you want to, this is what it should look like:

And we are there ! You have created the robots.txt file on your own.

Let us turn this little file into an SEO booster now. Are you ready ? 

Optimizing the robots.txt file for SEO

I am going over some of the most common ways to use the little file into an SEO booster. How you optimize the robots.txt depends on the type and value of content you have on your site. To be frank, there are several ways to use robots.txt to your advantage.

I personally feel that the best uses of the robots.txt file is to quickly increase search engines’ crawl budgets by indirectly telling them to not crawl the parts of your site which are not made public for readers.

Again, leading with an example, if you visit the robots.txt file for this site (nigelquadros.com), you will see that it completely disallows the login page. Why so ? 

Because that page is simply used for logging into the content management system of the site, it makes no sense to ask search engines to crawl it.

(If you are using WordPress, you can the exact disallow line shown in the above image)

In that way, you can use a similar commands to prevent search engine bots from crawling specific pages, categories and tag pages.

Now, if you want a bot to not crawl your page http://yoursite.com/page/, go ahead and type this:

You might be thinking about which pages to exclude from indexation. I have mentioned couple of common scenarios where that would happen:

Duplicate content on purpose. In most of the cases, duplicate content bad and you should instruct Google not to crawl that page/content because your website ranking may drop if you do.

Giving a simple example, in case you have a printer-friendly version of a page on your site, you have duplicate content. You could tell the bots to not crawl one of the versions preferably the printer friendly

This is very handy if you are split-testing pages that have the information but different designs.

Thank you pages. In case you do not know, the thank you page is one of every marketer’s favorite page because it simply means new business or considered as a lead.

…Am I right?

As it turns out, some thank you pages are accessible through Google which directly means that people can access these pages without going through the capturing process of the lead which is a bad scenario.

So by blocking thank you pages, you can make sure only those who convert will see it.

Let us say the thank you page is found at https://yoursite.com/thank-you/. So in your site’s robots.txt file, once you block the page it would look like this:

Keep in mind that there are no universal rules for which pages to disallow. It all depends on your judgement.

There are two other things you should know: noindex and nofollow.

I am sure you know the disallow directive we have been using for a while now ? It does not prevent the page from being indexed.

Theoretically, you can disallow a page, but it can still end up as indexed.

Usually, you would not want that.

That is the reason why you need the noindex directive that is super powerful. With this directive you can make sure bots do not visit or index the specified pages.

If you do not want the thank you pages being indexed, you can use both the directives:

Great ! That page will not show up in the SERPs.

Finally, there is the nofollow directive. This is identical to the no follow link.

But the nofollow directive is going to be implemented a little bit differently because it’s actually not part of the robots.txt file.

However, the nofollow directive is still instructing web robots, so it’s the same concept. The only difference is where it takes place.

Find the source code of the page you want to change, and make sure you’re in between the <head> tags.

Thereafter, paste this line:

<meta name=”robots” content=”nofollow”>

Finally, it should look like this:

Please make sure you are not inserting this line between any other tags––just the <head> tags.

In case you want to add both the directives, copy and paste this line of code:

<meta name=”robots” content=”noindex,nofollow”>

By this, both the directives will fall into effect.

Testing everything out to see if things work 

Since you have finished everything, you got to test if it actually works. To test your robots.txt file, sign into your Webmasters account by Google by clicking “Sign In” on the top right corner.

If you may not know, Google provides a free robots.txt in the Webmaster Console which is free to use along with the other features.

Select your property (i.e., your website) and directly click on “Crawl” in the left-hand side sidebar.

You will see the “robots.txt Tester” on the right. Click on that please…

If there is any old code in the box, kindly delete and replace it with your new robots.txt file script.

Head straight to “Test” at the bottom of the screen.

Now, if the “Test” text changes to “Allowed,” that means your robots.txt is successful and valid.

Here’s a little more information about the Console so you can learn the rest in detail and stand out on the web through the SEO you implement.

And we are done ! Now you upload your robots.txt to your root directory. Your website is now armed with a powerful file which you have created that will boost your search engine visibility by leaps and bounds.

A simple conclusion

By implementing your robots.txt file on your site in the right way, you are not just enhancing your own website SEO. You are doing the most important thing and that is helping your visitors.I have always loved sharing little-known SEO “hacks” verbally to friends who turn into clients (…that’s why I thought I’d write something for the web) so that they can be in an advantage position in several ways.

Practically, it does not require a lot of effort to set up a robots.txt file.  Whether you are starting your second or eighth website, using the robots.txt file can make a difference that you would love seeing when it comes to searches. I recommend trying it out if you have not done before.

Share with me your experience creating robots.txt files for your websites ?

Leave A Response