Skip to content

Reschedule issue rebalance jobs in case those get stuck

What does this MR do and why?

What we're trying to do here is just having a cron job keep on rescheduling active rebalances and let deduplication do its job. This is to guard against these long-running jobs getting interrupted and sent to the dead queue. This job continues where it left off so rescheduling does not mean starting over.

There is a race condition though where we could be in the middle of trying to reschedule (just before enqueuing), then the other job finishes. It won't dedup because the job is done so this would start a new rebalance. We guard against this by setting another key so that those that recently finished would be a no-op. There is no use-case for running rebalances in quick succession anyway.

re #343366 (closed)

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Alexandru Croitor

Merge request reports

Loading