Docs: clarify Gitaly latency bucket metrics in docs
When buckets are configured to enable Prometheus, metrics like gitaly_praefect_replication_latency_bucket
,
gitaly_praefect_replication_delay_bucket
and gitaly_praefect_node_latency_bucket
become available. These metrics are not referenced on the GitLab Prometheus metrics page or anywhere else in the docs. With the milestone Enable Prometheus by default coming up soon, it would be very helpful to have information about these metrics in the docs. Today, we are referring straight to https://gitlab.com/gitlab-org/gitaly/-/blob/master/internal/praefect/metrics/prometheus.go for information about these metrics.
It looks like these metrics were added in 12.10.
TODO
Could we add an explainer about the following metrics to the appropriate docs?
-
gitaly_praefect_replication_latency_bucket
: the amount of time it takes for replication to complete once the replication job has started. -
gitaly_praefect_replication_delay_bucket
: MR A measure of how much time passes between when the replication job is created and when it is started. -
gitaly_praefect_node_latency_bucket
: Latency in Gitaly returning health check information to Praefect. Increased latency may be indicative of Praefect connection saturation.
A Gitaly section in GitLab Prometheus metrics might be reasonable. Alternately, we may wish to expand the Prometheus section in the Gitaly reference.
- Slack thread for GitLab team members.
Gitaly metrics
Gitaly can be configured to report metrics. These are some of the Gitaly metrics served from the /metrics
path on the configured port (9090 by default).
Metric | Type | Since | Description | Labels |
---|---|---|---|---|
gitaly_praefect_replication_latency_bucket |
Histogram | 12.10 MR | The amount of time it takes for replication to complete once the replication job has started. | |
gitaly_praefect_replication_delay_bucket |
Histogram | 12.10 MR | A measure of how much time passes between when the replication job is created and when it is started. | |
gitaly_praefect_node_latency_bucket |
Histogram | 12.10 MR | Latency in Gitaly returning health check information to Praefect, indicates Praefect connection saturation. |
I am opening this issue on behalf of a Large Premium customer who is interested.