Geo: Bandaid for Registry rows stuck in sync state Queued
From #419370 (comment 1596281181):
I think this may be where we are setting registry record
state
topending
but not clearinglast_synced_at
. I think it doesn't always trigger the issue since that line is immediately followed bysync_repository
, which quickly moves the state tostarted
.I suspect that this issue only occurs when the lease is taken since the service exits without moving state to
started
. So it is mostly an issue for frequently mutated resources.Possible bandaid: Clear
last_synced_at
when setting state topending
.
Backport the fix to 16.3 and 16.4.
Workaround to unstick any permanently Queued items
On a Puma, Sidekiq, or Geo Log Cursor node in the secondary site:
gitlab-rails runner "Geo::ProjectRepositoryRegistry.where(state: ['0']).where('last_synced_at is not null').update_all(last_synced_at: nil)"
Use a cronjob to run this every 10 minutes, for example.
Edited by Michael Kozono