Re-run stuck merge request cleanup schedules
Problem to solve
It is possible that when a MergeRequest::CleanupSchedule
record get started (have the status
set to running
) and the sidekiq job that was working on it got killed (Sidekiq deploy, OOM killer, or an unhandled exception), it won't be worked on again.
Proposal
This was discussed in !65647 (diffs, comment 622356631).
The following ideas were suggested:
- Enqueue a scheduled job to set it back to
unstarted
after 6 hours (or whatever the highest execution time will be from the data we get when we enable this on production). - Have another cron scheduled job that updates stuck
running
cleanup schedules tounstarted
. - Have the same mechanism as repo mirroring with jid...
- Do the cleanup in
ScheduleMergeRequestCleanupRefsWorker
.
Each idea has its own pros/cons so they need to be weighed in first and choose the most appropriate one.