Avoid 6 hours of stale Geo status
What does this MR do and why?
Avoid 6 hours of stale Geo status when a Geo::MetricsUpdateWorker
is lost.
- Reduce the
Geo::MetricsUpdateWorker
jobidempotency_key
ttl
to 20 minutes - Use
until_executed
strategy to prevent duplicate work for 20 minutes - Remove exclusive lease (instead of reducing its timeout to 20 minutes) since it is redundant with
until_executed
Resolves #414047 (closed)
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
- Have Geo configured locally
-
cd
to thegitlab
directory in the primary GDK - Run
tail -f log/sidekiq_client.log | grep dedup
- In another tab, run
bin/rails console
- Run
20.times { Geo::MetricsUpdateWorker.perform_async }
to enqueue this job - Notice all the "deduplicated" logs proving that the worker is still not allowed to run concurrently
Repeat the above steps for the secondary GDK (it also runs this same worker).
If for some reason one of these jobs is enqueued and then is unexpectedly lost, then this deduplication will happen for 20 minutes (instead of 6 hours). After that, a new job can get enqueued again.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Michael Kozono