The db-specific replica metrics are not properly stored in logs
Found in: #345118 (comment 749348860).
The db_replica_main/ci_count
is always 0
regardless of db_replica_count
.
Troubleshooting
- Metrics scraping is part of the
lib/gitlab/metrics/subscribers/active_record.rb
- We increment metrics by calling
def increment(counter, db_config_name:, db_role:)
- The arguments passed for CI replica connection is
def increment(counter, db_config_name: 'ci_replica', db_role: 'replica')
- We increment a key
log_key = compose_metric_key(counter, db_role, db_config_name)
for ex.db_replica_ci_replica_count
- Then invocation of the
load_balancing_metric_keys
returns keys asdb_replica_ci_replica_count
Proposal
The usage of db_role
is in general redundant everywhere in this class, as db_config_name
properly describes all possible configurations. The db_role
is a left-over of old times when we had a single DB and had to find if a connection is primary or replica.
Maybe we change metrics to have in this form:
-
db_count
= SUM of all metrics for all databases and all roles: use metrics in form of thedb_#{db_role}_#{metric}
(as-is today) -
db_(primary/replica)_count
- SUM of all metrics for all databases divided by roles: use metrics in form of thedb_#{db_role}_#{metric}
(as-is today) - I would also consider deprecating those usages -
db_(main|main_replica|ci|ci_replica)_count
- per-database metrics: use metrics in form of thedb_#{db_config_name}_#{metric}
- Change here
The change will be only on the last one:
- From today's
db_(primary|replica)_(main|ci|main_replica|ci_replica)_count
intodb_(main|main_replica|ci|ci_replica)_count
Example
I see it as well on development environment:
{"method":"GET","path":"/-/peek/results","format":"json","controller":"Peek::ResultsController","action":"show","status":200,"time":"2021-12-01T14:15:11.893Z","params":[{"key":"request_id","value":"01FNV5JN5G9S79KD3BSEZE0M5Z"}],"correlation_id":"01FNV5JV4EN7Q50AB3HVRRBJDM","meta.user":"root","meta.caller_id":"Peek::ResultsController#show","meta.remote_ip":"10.0.2.2","meta.client_id":"user/1","remote_ip":"10.0.2.2","user_id":1,"username":"root","ua":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36","queue_duration_s":0.118461,"request_urgency":"default","target_duration_s":1,"redis_calls":5,"redis_duration_s":0.002277,"redis_read_bytes":399728,"redis_write_bytes":3107,"redis_cache_calls":2,"redis_cache_duration_s":0.001478,"redis_cache_read_bytes":399547,"redis_cache_write_bytes":2264,"redis_shared_state_calls":2,"redis_shared_state_duration_s":0.000488,"redis_shared_state_write_bytes":104,"redis_sessions_calls":1,"redis_sessions_duration_s":0.000311,"redis_sessions_read_bytes":181,"redis_sessions_write_bytes":739,"db_count":1,"db_write_count":0,"db_cached_count":0,"db_replica_count":1,"db_replica_main_count":0,"db_replica_ci_count":0,"db_replica_cached_count":0,"db_replica_main_cached_count":0,"db_replica_ci_cached_count":0,"db_replica_wal_count":0,"db_replica_main_wal_count":0,"db_replica_ci_wal_count":0,"db_replica_wal_cached_count":0,"db_replica_main_wal_cached_count":0,"db_replica_ci_wal_cached_count":0,"db_primary_count":0,"db_primary_main_count":0,"db_primary_ci_count":0,"db_primary_cached_count":0,"db_primary_main_cached_count":0,"db_primary_ci_cached_count":0,"db_primary_wal_count":0,"db_primary_main_wal_count":0,"db_primary_ci_wal_count":0,"db_primary_wal_cached_count":0,"db_primary_main_wal_cached_count":0,"db_primary_ci_wal_cached_count":0,"db_replica_duration_s":0.004,"db_replica_main_duration_s":0.0,"db_replica_ci_duration_s":0.0,"db_primary_duration_s":0.0,"db_primary_main_duration_s":0.0,"db_primary_ci_duration_s":0.0,"cpu_s":0.106572,"mem_objects":53950,"mem_bytes":5883824,"mem_mallocs":29254,"mem_total_bytes":8041824,"pid":83,"db_duration_s":0.0,"view_duration_s":0.00011,"duration_s":0.00721}
The snippet:
"db_replica_count":1,"db_replica_main_count":0,"db_replica_ci_count":0
Edited by Kamil Trzciński (Back 2025-01-01)