Skip to content

Fix Duo Chat documentation links

Nicolas Dular requested to merge nd/fix-documentation-links into master

What does this MR do and why?

Related: #429813 (closed)

This introduces a new filter and pipeline for Duo Chat where we change the links from relative to absolute ones. This is necessary because if the LLM includes a link from the documentation, the user needs to be able to click on it and gets to the correct URL.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

1. Get an embedding

embedding = Embedding::Vertex::GitlabDocumentation.first

2. Original content

embedding.content
=> "# Optimize GitLab for large repositories **(FREE ALL)**\\n\\nLarge repositories consisting of more than 50k files in a worktree\\nmay require more optimizations beyond\\n[pipeline efficiency](../pipelines/pipeline_efficiency.md)\\nbecause of the time required to clone and check out.\\n\\nGitLab and GitLab Runner handle this scenario well\\nbut require optimized configuration to efficiently perform its\\nset of operations.\\n\\nThe general guidelines for handling big repositories are simple.\\nEach guideline is described in more detail in the sections below:\\n\\n- Always fetch incrementally. Do not clone in a way that results in recreating all of the worktree.\\n- Always use shallow clone to reduce data transfer. Be aware that this puts more burden\\n  on GitLab instance due to higher CPU impact.\\n- Control the clone directory if you heavily use a fork-based workflow.\\n- Optimize `git clean` flags to ensure that you remove or keep data that might affect or speed-up your build.\\n\\n## Shallow cloning\\n\\n> Introduced in GitLab Runner 8.9.\\n\\nGitLab and GitLab Runner perform a [shallow clone](../pipelines/settings.md#limit-the-number-of-changes-fetched-during-clone)\\nby default.\\n\\nIdeally, you should always use `GIT_DEPTH` with a small number\\nlike 10. This instructs GitLab Runner to perform shallow clones.\\nShallow clones make Git request only the latest set of changes for a given branch,\\nup to desired number of commits as defined by the `GIT_DEPTH` variable.\\n"

3. Modified content (see absolute link for "pipeline efficiency")

Banzai.render(embedding.content, { base_url: embedding.url, pipeline: :duo_chat_documentation })
=> "<h1 data-sourcepos=\"1:1-1:1468\">Optimize GitLab for large repositories <strong>(FREE ALL)</strong>\\n\\nLarge repositories consisting of more than 50k files in a worktree\\nmay require more optimizations beyond\\n<a href=\"http://gdk.test:3000/help/ci/pipelines/pipeline_efficiency.html\">pipeline efficiency</a>\\nbecause of the time required to clone and check out.\\n\\nGitLab and GitLab Runner handle this scenario well\\nbut require optimized configuration to efficiently perform its\\nset of operations.\\n\\nThe general guidelines for handling big repositories are simple.\\nEach guideline is described in more detail in the sections below:\\n\\n- Always fetch incrementally. Do not clone in a way that results in recreating all of the worktree.\\n- Always use shallow clone to reduce data transfer. Be aware that this puts more burden\\n  on GitLab instance due to higher CPU impact.\\n- Control the clone directory if you heavily use a fork-based workflow.\\n- Optimize <code>git clean</code> flags to ensure that you remove or keep data that might affect or speed-up your build.\\n\\n## Shallow cloning\\n\\n&gt; Introduced in GitLab Runner 8.9.\\n\\nGitLab and GitLab Runner perform a <a href=\"http://gdk.test:3000/help/ci/pipelines/settings.html\">shallow clone</a>\\nby default.\\n\\nIdeally, you should always use <code>GIT_DEPTH</code> with a small number\\nlike 10. This instructs GitLab Runner to perform shallow clones.\\nShallow clones make Git request only the latest set of changes for a given branch,\\nup to desired number of commits as defined by the <code>GIT_DEPTH</code> variable.\\n</h1>"

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Nicolas Dular

Merge request reports

Loading