Skip to content

Expose queueing duration histogram metric

Tomasz Maczukin requested to merge expose-queueing-duration-histogram-metric into main

What does this MR do?

Exposes new histogram metric named gitlab_runner_job_queue_duration_seconds, based on data received with job payload from GitLab.

Why was this MR needed?

Follow-up for gitlab!90653 (merged). For reasoning - please read the description of the GitLab MR.

This metric to work properly requires the GitLab change to be merged. When available, we can see the result like:

# HELP gitlab_runner_job_queue_duration_seconds Histogram of job queuing duration
# TYPE gitlab_runner_job_queue_duration_seconds histogram
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="1"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="3"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="10"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="30"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="60"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="120"} 1
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="300"} 1
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="900"} 1
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="1800"} 1
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="3600"} 1
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="0",runner="oG2wMbsy",le="+Inf"} 1
gitlab_runner_job_queue_duration_seconds_sum{project_jobs_running="0",runner="oG2wMbsy"} 101
gitlab_runner_job_queue_duration_seconds_count{project_jobs_running="0",runner="oG2wMbsy"} 1
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="1"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="3"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="10"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="30"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="60"} 0
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="120"} 1
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="300"} 19
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="900"} 19
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="1800"} 19
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="3600"} 19
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="1",runner="oG2wMbsy",le="+Inf"} 19
gitlab_runner_job_queue_duration_seconds_sum{project_jobs_running="1",runner="oG2wMbsy"} 3581
gitlab_runner_job_queue_duration_seconds_count{project_jobs_running="1",runner="oG2wMbsy"} 19

When using with not updated GitLab installation, the output would be like:

# HELP gitlab_runner_job_queue_duration_seconds Histogram of job queuing duration
# TYPE gitlab_runner_job_queue_duration_seconds histogram
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="1"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="3"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="10"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="30"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="60"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="120"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="300"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="900"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="1800"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="3600"} 3
gitlab_runner_job_queue_duration_seconds_bucket{project_jobs_running="",runner="c8d11a2a",le="+Inf"} 3
gitlab_runner_job_queue_duration_seconds_sum{project_jobs_running="",runner="c8d11a2a"} 0
gitlab_runner_job_queue_duration_seconds_count{project_jobs_running="",runner="c8d11a2a"} 3

Without GitLab data, the metric will count 0 as the queuing duration, which means that all histogram buckets will be feeded together. This means the data is unusable, but doesn't break the Runner. Updating GitLab to a newer version is the only thing required to make this working.

What's the best way to test this MR?

What are the relevant issue numbers?

Edited by Tomasz Maczukin

Merge request reports

Loading