Expose load balancing DB metrics in Sidekiq logs
What does this MR do?
With this change we start logging db_primary_duration_s
and db_replica_duration_s
in Sidekiq logs, which are related to Sidekiq database load balancing, i.e. the time we spend in queries on the primary and replicas respectively.
We were already collecting these data, and are already publishing them for Puma, but due to allow-list logic in the Sidekiq instrumentation logger we weren't exposing these for SK.
We ended up deciding to not maintain this allow-list anymore; we generally want to make instrumentation metrics available in all loggers if we have them. The risk of leaking anything unintentionally from an instrumentation module seems low, since its sole purpose is to collect metrics meant to be published. As long as any metrics that do not apply and which might come back empty are filtered out gracefully (such as rack-attack, which wouldn't produce anything meaningful for sidekiq).
Screenshots (strongly suggested)
Structured JSON log output as emitted by a DB chaos job triggered via
[1] pry(main)> Chaos::DbSpinWorker.perform_async 10, 1
Before (or with Sidekiq LB disabled)
{
"severity": "INFO",
"time": "2021-05-18T11:55:49.259Z",
"class": "Chaos::DbSpinWorker",
"args": [
"10",
"1"
],
"retry": 3,
"queue": "chaos:chaos_db_spin",
"backtrace": true,
"version": 0,
"queue_namespace": "chaos",
"jid": "cdcabbda14a3aaefe6f5ca24",
"created_at": "2021-05-18T11:55:39.043Z",
"correlation_id": "f55c180ece2931568135967fd23453d5",
"worker_data_consistency": "always",
"enqueued_at": "2021-05-18T11:55:39.046Z",
"pid": 6,
"message": "Chaos::DbSpinWorker JID-cdcabbda14a3aaefe6f5ca24: done: 10.211164 sec",
"job_status": "done",
"scheduling_latency_s": 0.002807,
"job_size_bytes": 325,
"db_count": 10,
"db_write_count": 0,
"db_cached_count": 0,
"cpu_s": 9.933235,
"mem_objects": 7320507,
"mem_bytes": 352012464,
"mem_mallocs": 7320242,
"duration_s": 10.211164,
"completed_at": "2021-05-18T11:55:49.259Z",
"db_duration_s": 0.008025
}
After (Sidekiq load-balancing enabled)
{
"severity": "INFO",
"time": "2021-05-18T11:53:43.480Z",
"class": "Chaos::DbSpinWorker",
"args": [
"10",
"1"
],
"retry": 3,
"queue": "chaos:chaos_db_spin",
"backtrace": true,
"version": 0,
"queue_namespace": "chaos",
"jid": "46f7afb8d56189379d7da8e0",
"created_at": "2021-05-18T11:53:33.091Z",
"correlation_id": "f1158bc47eb6c471706744175dbac610",
"worker_data_consistency": "always",
"enqueued_at": "2021-05-18T11:53:33.112Z",
"pid": 6,
"message": "Chaos::DbSpinWorker JID-46f7afb8d56189379d7da8e0: done: 10.366137 sec",
"job_status": "done",
"scheduling_latency_s": 0.001477,
"job_size_bytes": 325,
"db_count": 10,
"db_write_count": 0,
"db_cached_count": 0,
"db_replica_count": 0,
"db_replica_cached_count": 0,
"db_replica_wal_count": 0,
"db_primary_count": 10,
"db_primary_cached_count": 0,
"db_primary_wal_count": 0,
"db_primary_duration_s": 0.018,
"db_replica_duration_s": 0.0,
"cpu_s": 9.915658,
"mem_objects": 6881359,
"mem_bytes": 330766472,
"mem_mallocs": 6877749,
"duration_s": 10.366137,
"completed_at": "2021-05-18T11:53:43.480Z",
"db_duration_s": 0.009456
}
Does this MR meet the acceptance criteria?
Conformity
-
I have included a changelog entry, or it's not needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Related to #330643 (closed)