Fix Memory::Watchdog Prometheus gauge labels
What does this MR do and why?
Due to a bug in prometheus-client-mmap
, we must not
set the pid
label for an aggregate: :all
gauge manually since
otherwise it will appear twice in the text output we
serve to Prometheus.
The Prometheus client library inserts the pid
label for a gauge with all
aggregation so that samples are preserved per every process we sample from, but it does not check whether that label is already in the label list. Therefore, it can appear twice.
Since we do not yet have a bug fix in the place for the library, we can easily circumvent this in the application for now by not setting the pid
label manually. Note that we still need to set it for the counters since the library does not auto-insert it here.
See also:
- https://gitlab.com/gitlab-org/prometheus-client-mmap/-/issues/37
- gitlab-com/gl-infra/production#7478 (closed)
NOTE: I did not include a changelog trailer because this feature was introduced in the same milestone and moreover is behind several feature toggles.
Screenshots or screen recordings
Before
grep'ing the /-/metrics
endpoint we can see the pid
label appears twice for this gauge:
# TYPE gitlab_memwd_heap_frag_limit gauge
gitlab_memwd_heap_frag_limit{pid="puma_0",pid="puma_0"} 0.10000000000000001
gitlab_memwd_heap_frag_limit{pid="puma_1",pid="puma_1"} 0.10000000000000001
This breaks the Prometheus scraper when trying to ingest this:
After
The pid
label only appears once now for all metrics:
# HELP gitlab_memwd_heap_frag_limit Multiprocess metric
# TYPE gitlab_memwd_heap_frag_limit gauge
gitlab_memwd_heap_frag_limit{pid="puma_0"} 0.10000000000000001
gitlab_memwd_heap_frag_limit{pid="puma_1"} 0.10000000000000001
# HELP gitlab_memwd_heap_frag_violations_handled_total Multiprocess metric
# TYPE gitlab_memwd_heap_frag_violations_handled_total counter
gitlab_memwd_heap_frag_violations_handled_total{pid="puma_0"} 3
gitlab_memwd_heap_frag_violations_handled_total{pid="puma_1"} 2
# HELP gitlab_memwd_heap_frag_violations_total Multiprocess metric
# TYPE gitlab_memwd_heap_frag_violations_total counter
gitlab_memwd_heap_frag_violations_total{pid="puma_0"} 7
gitlab_memwd_heap_frag_violations_total{pid="puma_1"} 6
The scraper is also happy:
How to set up and validate locally
See above
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #365950 (closed)