"View job currently using resource" button produces error 500 | Jobs remain in `canceling` state following pipeline deletion
Note for Support
Please see #483290 (comment 2097069681)
Summary
When a pipeline uses resource groups and if a job is waiting on a resource, the job page shows a button View job currently using resource
to list the current jobs using that resource. But, this page returns error 500.
The issue occurs because the job holding the resource is in canceling
state due to the deletion of its pipeline.
Steps to reproduce
- Create multiple pipelines and jobs with same resource group.
- The recent ones will wait for the resource to be available.
- Click the
View job currently using resource
button when this happens.
Example Project
What is the current bug behavior?
"View job currently using resource" button produces error 500.
What is the expected correct behavior?
"View job currently using resource" button should list the jobs currently using the resource.
Relevant logs and/or screenshots
Kibana: https://log.gprd.gitlab.net/app/r/s/lbmTo
Logs:
When clicking the view button:
No route matches {:action=>"test_report", :controller=>"projects/pipelines", :id=>nil, :namespace_id=>#<Group id:70201388 @redacted>, :project_id=>#<Project id:61095439 redacted>>}, possible unmatched constraints: [:id]
Did you mean? test_report_namespace_project_pipeline_path
When trying to GET
the orphaned job via API:
CommitStatus#commit delegated to pipeline.commit, but pipeline is nil: #<Ci::Build status: "canceling", finished_at: nil, created_at: "2024-09-05 14:09:59.806209000 +0000", updated_at: "2024-09-05 14:28:42.448311000 +0000", started_at: "2024-09-05 14:13:18.740669000 +0000", coverage: nil, name: "plan", options: nil, allow_failure: false, stage: "build", stage_idx: 2, tag: false, ref: "main", type: "Ci::Build", target_url: nil, description: nil, erased_at: nil, artifacts_expire_at: nil, environment: "${PROJECT}", when: "on_success", yaml_variables: nil, queued_at: "2024-09-05 14:13:16.861124000 +0000", lock_version: 4, coverage_regex: nil, retried: true, protected: true, failure_reason: "unknown_failure", scheduled_at: nil, token_encrypted: "|redacted...", resource_group_id: 5468674, waiting_for_resource_at: "2024-09-05 14:13:11.656192000 +0000", processed: true, scheduling_type: "stage", id: 7755056568, stage_id: 3190523159, partition_id: 102, auto_canceled_by_partition_id: nil, auto_canceled_by_id: nil, commit_id: 1441775080, erased_by_id: nil, project_id: 61095439, runner_id: 12270845, trigger_request_id: nil, upstream_pipeline_id: nil, user_id: 22548519, execution_config_id: nil, upstream_pipeline_partition_id: nil, tag_list: nil>
Output of checks
This bug happens on GitLab.com
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: \`sudo gitlab-rake gitlab:env:info\`) (For installations from source run and paste the output of: \`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production\`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:check SANITIZE=true`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true`) (we will only investigate if the tests are passing)
Workaround
Use a different resource_group
name.
Possible fixes
Example tickets
- https://gitlab.zendesk.com/agent/tickets/564815
- https://gitlab.zendesk.com/agent/tickets/564960
- https://gitlab.zendesk.com/agent/tickets/564729
- https://gitlab.zendesk.com/agent/tickets/564544
- https://gitlab.zendesk.com/agent/tickets/564748
- https://gitlab.zendesk.com/agent/tickets/565195
- https://gitlab.zendesk.com/agent/tickets/564941
Edited by Donique Smit