Collect and track GitLab Duo Chat evaluations
What does this MR do and why?
Related to #429642 (closed)
- Adds a new mode to
scripts/duo_chat/reporter.rb
("the reporter script"). - Adds a spec for the reporter script.
- Updates the documentation to outline how the GitLab Duo Chat QA evaluation test works.
The reporter script is a script meant to be run by the CI job rspec-ee unit gitlab-duo-chat-qa pg14
to process the CI artifacts and generate a Markdown report. The script also uploads its outputs to GitLab as a snippet/issue or MR note.
When the CI job rspec-ee unit gitlab-duo-chat-qa pg14
runs in the pipeline for a merge request, the following happens:
-
rspec-ee unit gitlab-duo-chat-qa pg14
(Ex. https://gitlab.com/gitlab-org/gitlab/-/jobs/5529680119) runsee/spec/lib/gitlab/llm/chain/agents/zero_shot/qa_evaluation_spec.rb
. -
ee/spec/lib/gitlab/llm/chain/agents/zero_shot/qa_evaluation_spec.rb
saves the result of its run as CI artifacts https://gitlab.com/gitlab-org/gitlab/-/jobs/5529680119/artifacts/browse/tmp/duo_chat/. -
scripts/duo_chat/reporter.rb
is run. The script processes the artifacts to generate a Markdown report then posts the report as a note to the MR. Check out this MR's note: !136799 (comment 1647272746)
This MR updates the reporter script so that when it runs in the pipeline for master
branch, it
-
uploads the artifacts as snippets, (ex. https://gitlab.com/gitlab-org/ai-powered/ai-framework/qa-evaluation/-/snippets/3621083),
-
posts the Markdown report as an issue (ex. https://gitlab.com/gitlab-org/ai-powered/ai-framework/qa-evaluation/-/issues/9)
and 3) updates the tracker issue https://gitlab.com/gitlab-org/ai-powered/ai-framework/qa-evaluation/-/issues/1 with the information extracted from the Markdown report.
How to test the change
We can confirm that the existing functionality of the script continues to work by checking out the note !136799 (comment 1647272746).
To test that the script can successfully collect and track the evaluations when run on master
branch's pipeline, follow these steps:
-
Create a new project access token https://gitlab.com/gitlab-org/ai-powered/ai-framework/qa-evaluation/-/settings/access_tokens.
⚠ Be sure to use a short expiry or delete the access token after you're done testing. -
Download the CI artifacts (
qa_*.json
files) https://gitlab.com/gitlab-org/gitlab/-/jobs/5529680119/artifacts/browse/tmp/duo_chat/.
Place them under your gdk's gitlab
project root directory like this:
gitlab-development-environment/gitlab/tmp/duo_chat
├── qa_1699866498.json
└── qa_1699866646.json
- Set these environment variables
CI_PIPELINE_URL="https://gitlab.com/gitlab-org/gitlab/-/pipelines/17983572039847129234" # The value does not matter.
CI_COMMIT_SHA="foobar123" # The value does not matter
CHAT_QA_EVALUATION_PROJECT_TOKEN_FOR_CI_SCRIPTS_API_USAGE="<access token>"
# Important! The reporter script knows its running in a `master` branch's pipeline by comparing these env. vars.
CI_COMMIT_BRANCH="master"
CI_DEFAULT_BRANCH="master"
- Run the script:
./scripts/duo_chat/reporter.rb
- Check out https://gitlab.com/gitlab-org/ai-powered/ai-framework/qa-evaluation/-/issues/1 and confirm there's a new entry.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.