Auto-retry jobs when they failed due to a known flaky test
From Draft to Ready
-
The found_known_flaky_tests
bash function is implemented -
MR description is up-to-date, and I tested all the edge-cases I want to test -
gitlab-org/ruby/gems/gitlab_quality-test_tooling!260 (merged) is merged, released, and we use that newly released version in this MR
What does this MR do and why?
- Auto-retry a CI/CD job (i.e. set a custom exit code for the job) when we detect it failed due to a known flaky test that made a CI/CD job fail on the main branch (i.e. issues with the ~"test-health:failures" label).
- Write a comment to this issue whenever a CI/CD job is about to fail after two RSpec processes.
Proof of work
$CI_AUTO_RETRY_JOBS_WITH_FLAKY_TESTS_ENABLED = true
When When the CI/CD job failed due to a known flaky test
$CI_AUTO_RETRY_JOBS_WITH_FLAKY_TESTS_NOTIFICATIONS_ENABLED = true
When Using #499936 as a reference.
Test commit to make the test above fail on purpose.
-
Expected:
- The job should be auto-retried
- We should see a comment for that job in gitlab-org/quality/engineering-productivity/team#573.
- Actual:
$CI_AUTO_RETRY_JOBS_WITH_FLAKY_TESTS_ENABLED = false
When -
Expected:
- It doesn't change the status code
- No comment for that job in gitlab-org/quality/engineering-productivity/team#573.
-
Actual:
- The logic wasn't run: https://gitlab.com/gitlab-org/gitlab/-/jobs/8150778528#L1703 (so no detection/comments in issue)
- The exit code was 1: https://gitlab.com/gitlab-org/gitlab/-/jobs/8150778528#L1806
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
Edited by David Dieulivol