Skip to content

Exit with failure code when sidekiq-cluster child process fails

Sean McGivern requested to merge sidekiq-cluster-exit-status into master

What does this MR do and why?

sidekiq-cluster handles process supervision for its child Sidekiq processes, and terminates itself and all child processes if any child Sidekiq process exits.

Previously, it always exited with a 0 status code (i.e. success), no matter how the child process had terminated.

Now it exits with 1 if any child process had a non-zero exit code. This allows a process supervisor one level up (like systemd) to detect failures and restart.

This came from !97694 (comment 1186537558)

How to set up and validate locally

  1. gdk start redis postgresql

  2. bin/sidekiq-cluster default default ; echo $?

  3. In a different shell session, ps ef | grep sidekiq

  4. Run kill $pid for one of those Sidekiq processes

  5. Check the shell session in step 2; you should see:

    {"severity":"INFO","time":"2022-12-01T14:40:42.493Z","message":"A worker terminated, shutting down the cluster"}
    {"severity":"INFO","time":"2022-12-01T14:40:42.496Z","message":"Shutting down"}
    {"severity":"INFO","time":"2022-12-01T14:40:42.497Z","message":"Scheduler exiting..."}
    {"severity":"INFO","time":"2022-12-01T14:40:42.498Z","message":"Terminating quiet threads"}
    {"severity":"INFO","time":"2022-12-01T14:40:42.498Z","message":"Scheduler exiting..."}
    {"severity":"INFO","time":"2022-12-01T14:40:42.598Z","message":"Pausing to allow jobs to finish..."}
    {"severity":"INFO","time":"2022-12-01T14:40:44.113Z","message":"Bye!"}
    1

Running kill -TERM $pid will give:

{"severity":"INFO","time":"2022-12-01T14:41:52.939Z","message":"A worker terminated, shutting down the cluster"}
{"severity":"INFO","time":"2022-12-01T14:41:52.943Z","message":"Shutting down"}
{"severity":"INFO","time":"2022-12-01T14:41:52.943Z","message":"Scheduler exiting..."}
{"severity":"INFO","time":"2022-12-01T14:41:52.944Z","message":"Terminating quiet threads"}
{"severity":"INFO","time":"2022-12-01T14:41:52.944Z","message":"Scheduler exiting..."}
{"severity":"INFO","time":"2022-12-01T14:41:53.045Z","message":"Pausing to allow jobs to finish..."}
{"severity":"INFO","time":"2022-12-01T14:41:53.355Z","message":"Bye!"}
0

As that involves a graceful exit for the child process.

Without this change, we always exit with 0.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Sean McGivern

Merge request reports

Loading