CounterJobWorker exceeds 300 seconds
Problem
The Analytics::UsageTrends::CounterJobWorker job is running over 300s and eating into our error budget.
Solution
Similar story as the consistency worker. We scan tables and count rows, for large tables this will take longer than 5 minutes.
Consistency worker
In the consistency worker, we limit the maximum runtime to 5 minutes however, we do the check after one "item" (group) is finished processing. The processing of one item can be longer than 5 minutes.
- Iterate each aggregated issues and MRs
- If the issue or MR is already deleted, remove the aggregated row. For large groups, this will take some time. We can fix it by tracking the execution. When the 5-minute limit is done, we stop the processing. The next job would continue the processing where the previous job finished.
MR solving for Consistency Worker: !86463 (merged).
Edited by Magdalena Frankiewicz