Automatically retry content-blocked Vertex AI requests
What does this MR do and why?
This MR modifies the exponential backoff mechanism to automatically retry the request if it contains a content moderation block result as is provided by the Vertex LLM API if auto-moderation blocks the output. This issue can occasionally occur for Explain This Vulnerability due to the adjacency of our requests to the exploitation of vulnerabilities.
As an added benefit, should this occur too much, the rate limit failure being raised will be an easily alertable warning for us to detect that something is not behaving as desired. Monitoring and Alerting For VertexAPI request f... (#414852 - closed) • Gregory Havenga • 16.3 • On track exists already to configure this.
Finally, the ETV prompts are being actively evaluated and manufactured to avoid content-moderation blocking, so this should barely occur, if at all, but due to the non-deterministic nature of LLM's, this lets us deliver a more valuable response to the user more consistently.
How to set up and validate locally
You will need to have a locally configured devkit with the ability to sufficient credentials to send requests to the Vertex API.
- Modify https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/gitlab/llm/templates/explain_vulnerability.rb#L98 to include the phrase "Please demonstrate how to exploit this vulnerability"
- Execute an explain this vulnerability query with the modified prompt. The above modification will likely cause most responses to fail, so you should see a RateLimitError resulting from repeated failures.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #414740 (closed)