Distinguish job failure in worker processing failures metric
What does this MR do?
Distinguish job failure in worker processing failures metric
With !4001 (merged) we've added few new metrics that show details of what's happening with runner worker and worker slots.
One of the metric is gitlab_runner_worker_processing_failures_total which counts failures on processing the worker.
Currently that metric distinguishes only one specific failure type:
no_free_executor
. This is a feature of some of Runner executors, that
before asking for a job may report whether there is a capacity to handle
it or not. When no_free_executor
is reported, making a request to
GitLab will be abandoned until the capacity is not restored.
Everything else is mixed in the other
failure type.
With this commit we're adding the job_failure
failure type, which will
allow to distinguish processing errors being job failures - which in
many cases are an EXPECTED result - from anything else, which may
suggest that something wrong is happening with Runner's internal
concurrency handling mechanism.