Follow-up with consistently failing job for ci_build_trace_sections migration
Follow-up issue for visibility from #328432 (comment 592233257):
How do we deal with consistently failing jobs?
Copying Yannis' @iroussos comment here for reference:
I think that we are in a stalemate in this case which showcases that we have to add a smarter retry mechanism: If we pick a batch size so large (or hit a batch with very expensive updates or a combination of both) so that the job can not finish on time with the set batch size, we have no way to retry the job while breaking it in smaller jobs. I think that there are 2 paths to consider here:
Ability to run a failed job with an increased ExclusiveLease timeout (easy solution but we will always hit a threshold) Ability to split a job to two (or multiple) new jobs with smaller batch sizes and min/max as chunks of the original range