Provides users the option to force-cancel a canceling pipeline (was Jobs stuck for hours in "Canceling" state with "Waiting for Resource" message)

Status update (2024-10-17)

Specific to the related bug with the Runner Kubernetes executor: There is a fix that has been merged in v17.3 and that was used to patch GitLab Runner v16.11 to v17.2.
The following patches have been released as of 2024-07-27:

* GitLab Runner v17.2.1 / GitLab Runner Helm Chart v0.67.1
* GitLab Runner v17.1.1 / GitLab Runner Helm Chart v0.66.1
* GitLab Runner v17.0.2 / GitLab Runner Helm Chart v0.65.2
* GitLab Runner v16.11.3 / GitLab Runner Helm Chart v0.64.3

We also had a separate issue (#483290 (closed)) that also causes jobs to be stuck in cancelling but happens in specific circumstances. See #483290 (comment 2097069681) to determine the appropriate fix.

Overview

This issue is being opened as per the documentation.

Description: A GitLab Premium customer reports that Job IDs are stuck in "Waiting for resource," but the UI does not show a status of running or pending, rather it shows canceling.

Project: /sparksuite-family/hoa-express/main-stack/
Job IDs: 7027551018, 7027751534
Job status: canceling
How often the problem occurs: Problem began occurring last week and has been seen sporadically since then.
Steps to reproduce the problem:

They have not been able to reproduce this issue consistently. It has been seen multiple times in the last week.

Thee job has been re-run since the initial failure but you can see a recording of the issue below: https://images.sparksuite.com/v/4QCsZEKKoJOs7jcmEyku

Zendesk ticket (internal link only)

Troubleshooting notes

User/Customer	GitLab Hosted or Self-Managed Runner	Runner Executor
Wes Cossick	Self-Managed Runner	Docker Machine
Niklas van Schrick	Self-Managed Runner	Kubernetes
SFDC	Self-Managed Runner	Kubernetes
Internal link		Kubernetes
Jon Benson	Self-Managed Runner

Implementation Guide

Allow users to force-cancel a canceling pipeline if it is stuck in canceling. A job could end up stuck in canceling due infrastructure issues(like a runner ran out of memory) or users mistakenly running logic that will run longer than they expected.

It might be worth getting a UX proposal for this. Do we want a different force-cancel button or do we want the cancel button to remain available and it will transition the job from canceling to cancelled.

From a backend perspective we can add to CommitStatus:

    event :cancel do
      transition canceling: :canceled
      transition running: :canceling, if: :supports_canceling?
      transition CANCELABLE_STATUSES.map(&:to_sym) + [:manual] => :canceled
    end

Then we need to ensure any stage/pipeline and cross project status changes work well with the new logic.

Edited Jan 02, 2025 by Caroline Simpson