[Feature flag] Rollout of `ci_reset_bridge_with_subsequent_jobs`
Feature
This feature uses the :ci_reset_bridge_with_subsequent_jobs
feature flag!
- Parent pipeline fails to proceed to next stage if job in child pipeline (strategy: depend) fails and then passes
- Introduced in: !60376 (merged)
TODO: Remove the technical debt in spec/models/ci/pipeline_spec.rb
.
Owners
- Team: grouppipeline authoring
- Most appropriate slack channel to reach out to:
#g_pipeline-authoring
- Best individual to reach out to: @furkanayhan
- PM: @dhershkovitch
Stakeholders
The Rollout Plan
- Partial Rollout on GitLab.com with beta groups
- Rollout on GitLab.com for a certain period (How long)
- Percentage Rollout on GitLab.com
- Rollout Feature for everyone as soon as it's ready
Beta Groups/Projects:
-
gitlab-org/gitlab
project -
gitlab-org
/gitlab-com
groups - ...
Expectations
What are we expecting to happen?
This FF fixes a bug in #297678 (closed).
When resetting an upstream bridge, we don't care about the subsequent skipped jobs that may be skipped because of the failed bridge.
As in the example, we are expecting that when a failed job of a downstream pipeline is retried, skipped subsequent jobs of the dependent upstream bridge job will be reset to created/pending.
What might happen if this goes wrong?
Retrying a job/pipeline or playing a manual job may not work as expected.
What can we monitor to detect problems with this?
We should not see an increase of 500 statuses...
-
Projects::PipelinesController#retry
=> https://log.gprd.gitlab.net/goto/bac2a11bcfe899fe231cc1ead3715cea -
Projects::JobsController#retry
=> https://log.gprd.gitlab.net/goto/b6ff8889b6f9ce43b3923e1057cd82e5 -
Projects::JobsController#play
=> https://log.gprd.gitlab.net/goto/ae7e1c66c95871772c3f02fe9ef8fbe3
Rollout Timeline
Initial Rollout
Preparation Phase
-
Enable on staging ( /chatops run feature set ci_reset_bridge_with_subsequent_jobs true --staging
) -
Test on staging -
Ensure that documentation has been updated (More info) -
Announce on the issue an estimated time this will be enabled on GitLab.com
Partial Rollout Phase
-
Enable on GitLab.com for individual groups/projects listed above and verify behaviour ( /chatops run feature set --project=gitlab-org/gitlab ci_reset_bridge_with_subsequent_jobs true
) -
Verify behaviour (See Beta Groups) and add details with screenshots as a comment on this issue -
If it is possible to perform an incremental rollout, this should be preferred. Proposed increments are: 10%
,50%
,100%
. Proposed minimum time between increments is 15 minutes.- When setting percentages, make sure that the feature works correctly between feature checks. See #327117 (closed) for more information
- For actor-based rollout:
/chatops run feature set ci_reset_bridge_with_subsequent_jobs 10 --actors
- For time-based rollout:
/chatops run feature set ci_reset_bridge_with_subsequent_jobs 10
-
Make the feature flag enabled by default i.e. Change default_enabled
totrue
-
Cross post chatops slack command to #support_gitlab-com
(more guidance when this is necessary in the dev docs) and in your team channel
Cleanup
This is an important phase, that should be either done in the next Milestone or as soon as possible. For the cleanup phase, please follow our documentation on how to clean up the feature flag.
-
Announce on the issue that the flag has been enabled -
Remove :ci_reset_bridge_with_subsequent_jobs
feature flag-
Remove all references to the feature flag from the codebase -
Remove the YAML definitions for the feature from the repository -
Create a Changelog Entry
-
-
Clean up the feature flag from all environments by running this chatops command in #production
channel/chatops run feature delete some_feature
.
Final Step
-
Close this rollout issue for the feature flag after the feature flag is removed from the codebase.
Rollback Steps
-
This feature can be disabled by running the following Chatops command:
/chatops run feature set --project=gitlab-org/gitlab ci_reset_bridge_with_subsequent_jobs false