Press enter to see results or esc to cancel.

Lesson 24: The Robots.txt – Website SEO Tutorial

Share this

The robots.txt file is located in the root directory of a webserver, or in the root directory of a webspace, and is used to give crawlers and bots instructions on how to handle the page.

Lesson 24: The Robots.txt - Website SEO Tutorial - Lesson 24: The Robots.txt - Website SEO Tutorial - Lesson 24: The Robots.txt - Website SEO Tutorial - robots.txt

Among other things, in the robots file are saved, which pages of the crawler may / should be visited, which index / index and / or the sitemap is.  Above all, the latter is very important as search engines can thus specifically evaluate the Sitemap and get an overview of all indexable individual pages and content.

Since Google crawls sites in the meantime renders the complete page – similar to how a browser makes – the exclusion of individual content, especially JavaScript and CSS files, can lead to poorer rankings, since Google could think of, for example, about JavaScript Hidden content. You should therefore not block direct page elements via the robots.txt file!

The exclusion of subpages can be useful especially for automatically generated pages such as keyword pages or category pages in content management systems. In WordPress, for example, tag pages and category pages are created automatically, but these pages have little value for the user and are often the cause of duplicate content, which means that the automatically generated pages can be excluded from the crawler using the Robots.txt file, To prevent duplicate content and make the most of the crawling resources.

Whether a robots file is accepted by the crawler and then forward the correct instructions to the bots can be checked in the Google Search Console using the “robots.txt tester”. An example robots file might look like this:

User-agent: *
Disallow: / tag /
Sitemap: //

 This Robots.txt file consists only of three lines, which tell all crawlers the location of the sitemap and that the keyword pages of the WordPress site should not be indexed. Here is a brief explanation of each line of this sample file:

  • User-agent: * – This line indicates that all robots file information should apply to all types of crawlers and bots
  • Disallow: / tag / – This line instructs the crawler not to index single pages that are within the tag directory. If you want index pages to be indexed, simply drop this line away
  • Sitemap: … – This line describes the location of the sitemap of a web page, where relative paths, as well as absolute paths, can be specified

In principle, it is an advantage to have as many individual pages as possible in the Google index in order to increase the probability of good rankings. However bad pages, ie pages with duplicate content or pages with weak / too few contents, damage the rankings of the whole Domain much more.

Even if a subpage has been excluded by Robots.txt, it can be that this, only with the display of a URL, appears in the search results. This is because the search engine has ONLY banned crawling. If the bottom should not under any circumstances into the index, one can use the Noindex.

How useful was this Lesson?

Click on a star to rate it!

Average rating / 5. Vote count:

Be the first to rate this post.!

As you found this post useful...

Follow us on social media!

We are sorry that this lesson was not useful for you!

Let us improve this lesson!

Tell us how we can improve this lesson?

Share this