Geo: restarting sidekiq doesn't cause BaseSchedulerWorker leases to be returned
I just restarted gitlab on sync.geo.gitlab.com - a standard gitlab-ctl stop
. Now Geo::RepositorySyncWorker
is refusing to run with this message:
{"severity":"ERROR","time":"2017-09-27T10:35:14.406Z","class":"Geo::RepositorySyncWorker","message":"Cannot obtain an exclusive lease. There must be another worker already in execution."}
This will continue until the lease expires, which could be up to an hour.
We wrap releasing the Gitlab::ExclusiveLock
in an ensure
block, but I guess we need to do something more? Is omnibus-gitlab killing sidekiq too aggressively? Does sidekiq provide some "we've been asked to terminate, please clean up" signal to jobs before taking them down? do we need to watch for signals explicitly (yuck)?
Presumably this affects more than just Geo::BaseSchedulerWorker