Log worker_id for Puma and Sidekiq
What does this MR do and why?
See #364539 (closed)
In !66694 (merged) we added a worker's process ID to application logs. This was done to support diagnosis during production incidents.
In &8105 I found that in addition to that, having the logical worker_id
would be more useful instead. This is because in Thanos, we do not collect PIDs, since that label value would be essentially unbounded. Plus, it's ephemeral; for instance, puma_0
might get restarted and hence run under a new PID, but it's still the same worker. So when breaking down by the pid
label in Thanos (this is actually the worker ID, not the process ID), we cannot currently correlate these data with logs.
Here I extend InstrumentationHelper to also log PidProvider#worker_id
. This is used by both Puma and Sidekiq, so it will work for both.
For Puma, we currently run 7 processes in SaaS: 1 master + 6 workers. So the label cardinality is 7. For Sidekiq, we run a single process, so label cardinality is 1.
I also got
Screenshots or screen recordings
$ tail -n1 log/sidekiq.log
{"severity":"INFO",...,"worker_id":"sidekiq_0",...,"db_duration_s":0.099319}
$ tail -n1 log/development_json.log
{"method":"GET",...,"worker_id":"puma_1",...,"duration_s":2.04893}
How to set up and validate locally
Run Puma or Sidekiq and grep for worker_id
in the logs (see above.)
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #364539 (closed)