Configure robots.txt and bot-related meta tags
What does this MR do and why?
- Conditional logic for robots.txt.
- Adds an option for adding a
noindex
meta tag. - Adds a canonical URL meta tag.
Closes #34 (closed)
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
-
Configure a local GitLab Docs environment: https://gitlab.com/gitlab-org/technical-writing-group/gitlab-docs-hugo/-/blob/main/doc/setup.md.
Test the canonical URL tag:
- Run a regular build:
make view
. The canonical URL on each page should be its URL on new.docs.gitlab.com. - Run an archive build:
CI_COMMIT_REF_NAME=17.2 make view-archive
. The canonical URL on each page should be its URL for the latest version (so, not 17.2) on new.docs.gitlab.com - Review app: The canonical URL on each page should be its URL on new.docs.gitlab.com (and not the review app URL).
Test the ability to add a noindex
tag:
- Run a regular build:
make view
- Edit a page, like
content/shortcodes.md
. - Add this as a new line to the front matter:
noindex: true
- View the page and inspect the source. It should now have one of these:
<meta name="robots" content="noindex">
Test the robots file:
- Run a build with the production URL:
hugo --baseURL="https://docs.gitlab.com" serve
and load the robots page: http://localhost:1313/robots.txt. The robots file should allow all bots, but block them on review app and versioned URLs. - Run an archive build:
CI_COMMIT_REF_NAME=17.2 make view-archive
and load the robots page: http://localhost:1313/robots.txt. All crawlers should be blocked, except Elastic. - A regular build with
make view
should, for now, have the same logic as an archive: no bots, except Elastic.
Merge request acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this merge request.
Edited by Hiru Fernando