[Feature flag] Rollout of `container_registry_expiration_policies_throttling`
What
container_registry_expiration_policies_throttling
roll out.
Owners
- Team: Package
- Most appropriate slack channel to reach out to:
#s_package
- Best individual to reach out to: @10io
Expectations
What are we expecting to happen?
This flag will enabled some limits around the container tags cleanup services / workers. See the analysis in #208193 (comment 362910703)
With the feature flag enabled
- A new application setting available for the container registry:
container_registry_delete_tags_service_timeout
-
https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/projects/container_repository/gitlab/delete_tags_service.rb#L17 will run for
::Gitlab::CurrentSettings.current_application_settings.container_registry_delete_tags_service_timeout
max
With the feature flag disabled
- https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/projects/container_repository/gitlab/delete_tags_service.rb#L17 can run for an arbitrary amount of time
What might happen if this goes wrong?
- Delete tags could not be deleted
What can we monitor to detect problems with this?
- Container registry: https://dashboards.gitlab.net/d/registry-main/registry-overview?orgId=1&var-PROMETHEUS_DS=Global&var-environment=gprd&var-stage=main&var-sigma=2
- Sentry on both workers:
- Thanos dashboard on the current load (eg. number of repositories to cleanup) for these workers: https://thanos-query.ops.gitlab.net/graph?g0.range_input=30m&g0.max_source_resolution=0s&g0.expr=max(limited_capacity_worker_remaining_work_count%7Bworker%3D%22ContainerExpirationPolicies%3A%3ACleanupContainerRepositoryWorker%22%2C%20env%3D%22gprd%22%7D)&g0.tab=0&g1.range_input=1h&g1.max_source_resolution=0s&g1.expr=max(limited_capacity_worker_max_running_jobs%7Bworker%3D%22ContainerExpirationPolicies%3A%3ACleanupContainerRepositoryWorker%22%2C%20env%3D%22gprd%22%7D)&g1.tab=0&g2.range_input=1h&g2.max_source_resolution=0s&g2.expr=min(limited_capacity_worker_running_jobs%7Bworker%3D%22ContainerExpirationPolicies%3A%3ACleanupContainerRepositoryWorker%22%2C%20env%3D%22gprd%22%7D)&g2.tab=0
Beta groups/projects
n/a. This feature flag is global for the container registry tags cleanup system.
Roll Out Steps
-
Enable on staging -
Test on staging - Impossible to fully test on staging due to https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11509
-
Ensure that documentation has been updated - [-] Enable on GitLab.com for individual groups/projects listed above and verify behaviour
- feature flag is globable
-
Coordinate a time to enable the flag with #production
and#g_delivery
on slack. -
Announce on the issue an estimated time this will be enabled on GitLab.com -
Enable on GitLab.com by running chatops command in #production
-
Cross post chatops slack command to #support_gitlab-com
(more guidance when this is necessary in the dev docs) and in your team channel -
Announce on the issue that the flag has been enabled -
Remove feature flag and add changelog entry - Remove the
preloaded
option in#with_runnable_policy
inContainerExpirationPolicyWorker
- !50858 (comment 496385677)
- Remove the
-
After the flag removal is deployed, clean up the feature flag by running chatops command in #production
channel
Edited by David Fernandez