Skip to content

Avoid 6 hours of stale Geo status

Michael Kozono requested to merge mk/avoid-hours-of-stale-status into master

What does this MR do and why?

Avoid 6 hours of stale Geo status when a Geo::MetricsUpdateWorker is lost.

  • Reduce the Geo::MetricsUpdateWorker job idempotency_key ttl to 20 minutes
  • Use until_executed strategy to prevent duplicate work for 20 minutes
  • Remove exclusive lease (instead of reducing its timeout to 20 minutes) since it is redundant with until_executed

Resolves #414047 (closed)

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

  • Have Geo configured locally
  • cd to the gitlab directory in the primary GDK
  • Run tail -f log/sidekiq_client.log | grep dedup
  • In another tab, run bin/rails console
  • Run 20.times { Geo::MetricsUpdateWorker.perform_async } to enqueue this job
  • Notice all the "deduplicated" logs proving that the worker is still not allowed to run concurrently

Repeat the above steps for the secondary GDK (it also runs this same worker).

If for some reason one of these jobs is enqueued and then is unexpectedly lost, then this deduplication will happen for 20 minutes (instead of 6 hours). After that, a new job can get enqueued again.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Michael Kozono

Merge request reports

Loading