Skip to content

Geo: Fix sync failure retry backoff

Michael Kozono requested to merge mk/fix-retry-sync-backoff into master

What does this MR do and why?

Fixes sync failure retry exponential backoff.

Blobs are not affected since there are none that are "mutable", but I made the same changes for consistency (and to avoid adding overrides since they reuse the same Scheduler Worker code) and for future safety.

Resolves #469587 (closed)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Before: A registry row that is persistently failing to sync will always have retry_count: 1

After: A registry row that is persistently failing to sync will increment retry_count each time.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

  1. Set up Geo
  2. Cause a persistent sync failure. For example on the primary GDK, gdk stop rails-web
  3. Open Rails console in the secondary GDK and trigger the first failed sync:
r = Geo::ProjectRepositoryRegistry.first
r.replicator.resync
  1. In the secondary site, tail -f /path/to/gdk/gitlab/log/geo.log and wait
  2. Notice that on the master branch, the repo gets resynced every time RepositoryRegistrySyncWorker runs. In Rails console you can look at the registry and see that retry_count doesn't change after multiple syncs.
  3. Notice that on this branch, the repo gets resynced a few times but after 5 minutes or so you should notice it doesn't get resynced on every RepositoryRegistrySyncWorker run. In Rails console you can look at the registry and see that retry_count has increased to say 6 or so.
Edited by Michael Kozono

Merge request reports

Loading