Geo: Fix sync failure retry backoff (!157805) · Merge requests · GitLab.org / GitLab

Michael Kozono requested to merge mk/fix-retry-sync-backoff into master Jun 28, 2024

What does this MR do and why?

Fixes sync failure retry exponential backoff.

Blobs are not affected since there are none that are "mutable", but I made the same changes for consistency (and to avoid adding overrides since they reuse the same Scheduler Worker code) and for future safety.

Resolves #469587 (closed)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Before: A registry row that is persistently failing to sync will always have retry_count: 1

After: A registry row that is persistently failing to sync will increment retry_count each time.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Set up Geo
Cause a persistent sync failure. For example on the primary GDK, gdk stop rails-web
Open Rails console in the secondary GDK and trigger the first failed sync:

r = Geo::ProjectRepositoryRegistry.first
r.replicator.resync

In the secondary site, tail -f /path/to/gdk/gitlab/log/geo.log and wait
Notice that on the master branch, the repo gets resynced every time RepositoryRegistrySyncWorker runs. In Rails console you can look at the registry and see that retry_count doesn't change after multiple syncs.
Notice that on this branch, the repo gets resynced a few times but after 5 minutes or so you should notice it doesn't get resynced on every RepositoryRegistrySyncWorker run. In Rails console you can look at the registry and see that retry_count has increased to say 6 or so.

Edited Jun 29, 2024 by Michael Kozono

Geo: Fix sync failure retry backoff

What does this MR do and why?

MR acceptance checklist

Screenshots or screen recordings

How to set up and validate locally

Merge request reports