Fix BulkImport pipeline retries
What does this MR do and why?
BulkImport pipeline retries were not working as the pipeline was not raising BulkImports::NetworkError
exceptions as they were being rescued by the
catchall rescue StandardError.
With this change, the pipeline runner is now handling BulkImports::NetworkError
exceptions, and in case the exception is retriable, for example, because of a Net::ReadTimeout
error, the pipeline will re-raise the error BulkImports::PipelineRetryError
so the PipelineWorker can rescue it and retry the worker.
Besides, the max try count is increased to 10 since there is no problem in retrying a few more times before marking the pipeline as failed.
Related to: Make BulkImport to handle Net::ReadTimeout (#365131 - closed)
Screenshots or screen recordings
These are strongly recommended to assist reviewers and reduce the time to merge your change.
How to set up and validate locally
To test the retry mechanism, we need to simulate a retriable error to occur or make the API return a 429 status.
To simulate a Net::ReadTimeout
, we can add sleep 60 to one of the API actions used by BulkImport, for example, add a sleep
in the GraphQL API for the group.
Changing the GroupResolver like the code below will make the endpoint timeout in the first 3 attempts.
module Resolvers
class GroupResolver < BaseResolver
prepend FullPathResolver
type Types::GroupType, null: true
def resolve(full_path:)
if Gitlab::Cache::Import::Caching.increment('groups_timeout', timeout: 10.minutes) < 3
sleep 60
end
model_by_full_path(Group, full_path)
end
end
end
-
Feature.enable(:bulk_import)
. - Create a top-level group.
- Go to
/groups/new#import-group-pane
page and enter the instance URL and access token (needs to beapi
&read_repository
scope). - Select the newly created group and click Import.
- Wait for Group import to complete and verify the imported group data.
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.