The ProcessRequirementsReportsWorker retries on an unrecoverable error

I came here from looking at the error budgets for Certify

The biggest cause of spend is the RequirementsManagement::ProcessRequirementsReportsWorker failing and being retried 3 times. This worker seems to fail about 30% of the runs with an Gitlab::Access::AccessDeniedError. The past 7 days we've spent about 8 hours of compute running these jobs that cannot succeed.

json.class.keyword: Descending	json.job_status.keyword: Descending	json.error_class.keyword: Descending	Count	Count percentages	Sum of json.duration_s
RequirementsManagement::ProcessRequirementsReportsWorker	done	Missing	523,534	69.45%	87,031.19
RequirementsManagement::ProcessRequirementsReportsWorker	fail	Gitlab::Access::AccessDeniedError	230,299	30.55%	30,303.60

(https://log.gprd.gitlab.net/goto/44537aca483b44655c01d987e84e9fa9)

This error is raised from the ProcessTestReportsService. This kind of error does not warrant a Sidekiq retry, since it is unlikely succeed on a next run.

Perhaps we could also skip scheduling this worker from the BuildFinishedWorker if it is going to be a no-op anyway?

Acceptance Criteria

There are two changes to make here:

Don't retry this job if it's unlikely to run on subsequent attempts.
Don't schedule this job in the first place if it won't succeed. A strategy for this is detailed in #338717 (comment 653146872):

So in this particular example, I think we could consider wrapping the 4 checks in a method that we can check before scheduling the job.

In our codebase, we have this kind of pattern using a nested Async class, perhaps we could do something similar here?

Edited Aug 17, 2021 by John Hope