Menù

How to Configure Robot.txt file

In the intricate realm of the internet, ensuring that your website is properly indexed by search engines is crucial for enhancing visibility and attracting visitors. This is where Robot.txt, a seemingly simple text file, plays a pivotal role in guiding search engine crawlers and shaping your website's presence in the online landscape.

Understanding robot.txt - the gatekeeper of your website

Imagine your website as a bustling city, with search engine crawlers acting as curious explorers. Robot.txt serves as the gatekeeper, instructing these crawlers which areas to explore and which to leave untouched. This file allows you to control how search engines interact with your website, ensuring that only relevant and valuable content is indexed.

Key purposes of Robot.txt:

  • Directing crawlers: Specify which pages and directories crawlers should visit and index.
  • Preventing overloading: Instruct crawlers not to overwhelm your server by limiting their access to specific areas.
  • Hiding sensitive content: Protect private or sensitive information from being indexed and displayed in search results.

 

Editing robots.txt for different purposes

SEO Audit’s robots.txt generation tool simply creates a file with exclusion directives for files and directories that are not meant to be public, and should not be indexed. You can edit the content of the robots.txt file or restore it to the default configuration.

  • Access the SEO Audit module from the PrestaShop admin panel.
  • Navigate to the "Robot.txt" page.
  • Edit the content of your robots.txt file.
  • Save your changes.

Here are some examples illustrating how to edit the Robots.txt file for different purposes:

PurposeExplanationExample
Disallowing crawling
To prevent search engine bots from crawling specific directories or pages, add the "Disallow" directive followed by the path to the directory or page

Disallow: /admin/

Disallow: /private-page.html

Allowing crawling
Conversely, if you want to allow search engine bots to crawl previously disallowed pages or directories, you can use the "Allow" directive.Allow: /images/
Sitemap location

You can also specify the location of your sitemap using the "Sitemap" directive. This helps search engines discover and crawl your site's pages more efficiently.

Sitemap: https://www.example.com/sitemap.xml
Blocking specific user agents

You can specify directives to control the behavior of specific user agents, such as search engine crawlers or bots from certain organizations.

User-agent: Googlebot

Disallow: /private-directory/

User-agent: Bingbot

Disallow: /admin-page.html

Allowing access to certain user agents

Conversely, you may want to grant access to specific user agents while restricting others. Use the "Allow" directive for this purpose.

User-agent: *

Disallow: /restricted-directory/

User-agent: Googlebot

Allow: /allowed-directory/

Crawling frequency

You can specify the crawl delay for search engine bots to control how frequently they access your site. This can help manage server load and bandwidth usage.

User-agent: *

Crawl-delay: 10

Wildcard usageWildcards (*) can be used in Robots.txt directives to match patterns of URLs. The example uses a wildcard ("*.pdf$") to block all URLs ending with the ".pdf" extension. This prevents search engines from indexing PDF files on your website.

User-agent: *

Disallow: /*.pdf$

Conteggio visualizzazioni articolo: 139 visualizzazioni