Avoid 6 hours of stale Geo status (!127810) · Merge requests · GitLab.org / GitLab

Michael Kozono requested to merge mk/avoid-hours-of-stale-status into master Jul 27, 2023

What does this MR do and why?

Avoid 6 hours of stale Geo status when a Geo::MetricsUpdateWorker is lost.

Reduce the Geo::MetricsUpdateWorker job idempotency_key ttl to 20 minutes
Use until_executed strategy to prevent duplicate work for 20 minutes
Remove exclusive lease (instead of reducing its timeout to 20 minutes) since it is redundant with until_executed

Resolves #414047 (closed)

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Have Geo configured locally
cd to the gitlab directory in the primary GDK
Run tail -f log/sidekiq_client.log | grep dedup
In another tab, run bin/rails console
Run 20.times { Geo::MetricsUpdateWorker.perform_async } to enqueue this job
Notice all the "deduplicated" logs proving that the worker is still not allowed to run concurrently

Repeat the above steps for the secondary GDK (it also runs this same worker).

If for some reason one of these jobs is enqueued and then is unexpectedly lost, then this deduplication will happen for 20 minutes (instead of 6 hours). After that, a new job can get enqueued again.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited Aug 01, 2023 by Michael Kozono

Avoid 6 hours of stale Geo status

What does this MR do and why?

How to set up and validate locally

MR acceptance checklist

Merge request reports