Robots.txt - Interview Question

Learning Online

SEO

SEO Introduction

Types of SEO

White Hat or Organic SEO

On-Page Optimization – Part I

On-Page Optimization – Part II

On-Page Optimization – Part III

On-Page Optimization – Part IV

On-Page Optimization – Part V

Sitemap.xml

Robots.txt

Off-Page Optimization – Part I

Off-Page Optimization – Part II

Robots.txt

The Robot Exclusion Standard is also known as the Robots Exclusion Protocol or robots.txt protocol. It is a convention to advising cooperating web crawlers and other web robots about accessing all or part of a website which is otherwise publicly viewable. Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. The standard is different from, but can be used in conjunction with, Sitemaps, a robot inclusion standard for websites.

Example:

This example tells all robots that they can visit all files because the wildcard * specifies all robots:

User-agent: *

Disallow:

The same result can be accomplished with an empty or missing robots.txt file.

This example tells all robots to stay out of a website:

User-agent: *

Disallow: /

This example tells all robots not to enter three directories:

User-agent: *

Disallow: /cgi-bin/

Disallow: /tmp/

Disallow: /junk/

This example tells all robots to stay away from one specific file:

User-agent: *

Disallow: /directory/file.html

Note that all other files in the specified directory will be processed.

This example tells a specific robot to stay out of a website:

User-agent: BadBot # replace 'BadBot' with the actual user-agent of the bot

Disallow: /

This example tells a specific robot not to enter one specific directory:

User-agent: BadBot # replace 'BadBot' with the actual user-agent of the bot

Disallow: /private/

Example demonstrating how comments can be used:

# Comments appear after the "#" symbol at the start of a line, or after a directive

User-agent: * # matches all bots

Disallow: / # keep them out

It is also possible to list multiple robots with their own rules. The actual robot string is defined by the crawler. A few sites, such as Google, support several user-agent strings that allow the operator to deny access to a subset of their services by using specific user-agent strings.

Example demonstrating multiple user-agents:

User-agent: googlebot        # all services

Disallow: /private/          # disallow this directory

User-agent: googlebot-news   # only the news service

Disallow: /                  # on everything

User-agent: *                # all robots

Disallow: /something/        # on this directory

Note: Robot.txt must be very well format.

Designed and Developed By WsCube Tech