Skip to content

Update rollback guidelines to be context-aware

Amy Phillips requested to merge ap/updated-rollback-guidelines into master

Suggesting an update to the rollback guidelines to give release managers the context for them to assess a situation and make a decision rather than relying on blanket rules that might not be suitable. Rollbacks can be disruptive but as long as we're aware of that I think they can be considered for lower-severity incidents.

This MR is intended for the discussion that I originally suggested happen on gitlab-com/gl-infra/delivery#2630 (closed)

Context from the issue, and hopefully reflected in the change on this MR:

Delivery Group, as Release Managers, is often involved in incident discussions and can be asked to roll back a faulty package to a previous one not containing the issue.

While rolling back has been built as a fast resolution for incidents and the effective effort of rolling back is not high (but it can be if executed all the time a problem occurs) there may still be times when we would want to roll forward with a fix instead.

Rolling back a deployment will restore service for users but also blocks further deployments until a fix has been merged and a new package is ready to deploy. A rollback also reverts an entire package, not just the MR causing problems, for some packages we may revert several hundred changes for the sake of one. This is especially significant when we're close to the monthly release deadline.

Rollback itself is not a solution but a mitigation action. Having the right sense of urgency to fix problems and roll forward should always be considered as an alternative, depending on the scope and severity of the impact.

Edited by Amy Phillips

Merge request reports

Loading