Expose queueing duration histogram metric
What does this MR do?
Exposes new histogram metric named gitlab_runner_job_queue_duration_seconds
, based on data received with job payload from GitLab.
Why was this MR needed?
Follow-up for gitlab!90653 (merged). For reasoning - please read the description of the GitLab MR.
This metric to work properly requires the GitLab change to be merged. When available, we can see the result like:
# HELP gitlab_runner_job_queue_duration_seconds Histogram of job queuing duration
# TYPE gitlab_runner_job_queue_duration_seconds histogram
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="1"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="3"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="10"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="30"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="60"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="120"} 1
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="300"} 1
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="900"} 1
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="1800"} 1
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="3600"} 1
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="+Inf"} 1
gitlab_runner_job_queue_duration_seconds_sum{project_jobs_running="0",runner="oG2wMbsy"} 101
gitlab_runner_job_queue_duration_seconds_count{project_jobs_running="0",runner="oG2wMbsy"} 1
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="1"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="3"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="10"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="30"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="60"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="120"} 1
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="300"} 19
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="900"} 19
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="1800"} 19
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="3600"} 19
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="+Inf"} 19
gitlab_runner_job_queue_duration_seconds_sum{project_jobs_running="1",runner="oG2wMbsy"} 3581
gitlab_runner_job_queue_duration_seconds_count{project_jobs_running="1",runner="oG2wMbsy"} 19
When using with not updated GitLab installation, the output would be like:
# HELP gitlab_runner_job_queue_duration_seconds Histogram of job queuing duration
# TYPE gitlab_runner_job_queue_duration_seconds histogram
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="1"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="3"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="10"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="30"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="60"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="120"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="300"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="900"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="1800"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="3600"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="+Inf"} 3
gitlab_runner_job_queue_duration_seconds_sum{project_jobs_running="",runner="c8d11a2a"} 0
gitlab_runner_job_queue_duration_seconds_count{project_jobs_running="",runner="c8d11a2a"} 3
Without GitLab data, the metric will count 0
as the queuing duration, which means that all histogram buckets will be feeded together. This means the data is unusable, but doesn't break the Runner. Updating GitLab to a newer version is the only thing required to make this working.
What's the best way to test this MR?
What are the relevant issue numbers?
Edited by Tomasz Maczukin