Capture errors raised for a batch when resetting CI minutes
What does this MR do?
Related to #243762 (closed)
According to this ~bug investigation we realized that Sentry did not contain the actual error that caused a batch to fail. This may mask a consistent error but we don't know about it.
We should always capture the actual error that was raised when processing a batch of namespaces. This way if a process fails with multiple errors (1 error per batch at most) we would know whether they are all related or there are different issues.
In addition to that we are increasing the retry count from 3
to 10
in case the issues are related to timeouts. These jobs only run once a month so it shuld be safe to increase it. We normally see about 2-3 workers failing out of 70+.
Screenshots
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team