Process one record at a time in Bulk Import pipelines (!52330) · Merge requests · GitLab.org / GitLab

George Koltsov requested to merge georgekoltsov/graphql-extractor-yield-one-record into master Jan 22, 2021

What does this MR do?

This MR:

Updates Bulk Import ETL pipelines to process 1 record at a time, instead of operating on a whole collection at once. This removes a bit of complexity from a lot of places (transformers and loaders), since there is no need to loop through the whole collection
Adds ExtractedData object to wrap raw hash data from GraphQL for easier use in the pipelines
Removes hash digger transformer, since there is no need in it anymore
Removes underscorify transformer in order to utilize GraphQL aliasing ability instead

To test

Seed your local environment with groups via rake task bundle exec rake "gitlab:seed:group_seed[3,root]"
Copy name & path of top level group that was generated
Open rails console and run (replace with your values)

Feature.enable(:bulk_import) 

rand = (1..1000).to_a.sample
user = User.first
credentials = { url: 'http://gdk.test:3000', access_token: <api scope token> }
params = [{ source_type: 'group_entity', source_name: '<source group name>', source_full_path: '<source group path>', destination_name: "foo#{rand}", destination_namespace: 'root' }]

BulkImportService.new(user, params, credentials).execute

bulk_import = BulkImport.last
bulk_import.finished?

Alternatively you can import the group via UI by opening Import tab in '/groups/new' page.

Wait for bulk import to finish (this requires sidekiq to be running)
Once finished, verify a new group (and it's subgroups) was imported under root namespace

Mentions #299527 (closed)

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
Tested in all supported browsers
Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

Label as security and @ mention @gitlab-com/gl-security/appsec
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
Security reports checked/validated by a reviewer from the AppSec team

Edited Jan 26, 2021 by George Koltsov

Process one record at a time in Bulk Import pipelines