Skip to content

Draft: Add user mapping to github take 2

Carla Drago requested to merge 466355-github-user-mapping-2 into master

What does this MR do and why?

This adds the user mapping feature to the github importer. It's in a draft stage as there are several different parts to user mapping in the Github Importer and this MR does not implement everything:

  1. Updating the user finder to find or create an import_source_user on import. This MR has a version of this implemented, but it doesn't use the existing GH user finder/mapper cache. @.luke has commented below with a possible solution and will be working on that. (Update: we don't need the old caching as it's for old user mapping.)
  2. Pushing placeholder references for every record imported. These references will indicate which user_id or author_id belongs to which import_source_user so that when placeholder users are re-assigned, we know which user_ids and author_ids to update.
  3. Loading the placeholder references upon each stage completion. This saves references to the import_source_user_placeholder_references table on the DB
  4. Ensure the reference store is finalized before finishing the import.
  5. Adding a Feature flag for all work to be behind.

This MR has changes for:

  1. Adding a feature flag.
  2. Ensuring the reference store is finalized
  3. Loading references at the end of each stage
  4. Pushing references in the following importers:
  • issue_importer
  • note_importer
  • pull_request_importer
  • pull_requests: -- merged_by_importer -- review_importer
  • diff_note_importer
  • events: -- base_importer -- changed_assignee -- changed_label -- changed_milestone -- changed_reviewer -- closed -- cross_referenced -- renamed -- reopened

Normally the push happens using the record, reference, and source author or user_id. This uses a push_placeholder_references method.

When records are created using legacy_bulk_insert, an array of ids for each row created is returned. This is done when creating notes, so a push_placeholder_note_refs_by_ids.

When records require a composite key, instead of a numeric key, push_placeholder_ref_with_composite_key is used.

All these methods are in the in the newly created Gitlab::GithubImport::PushPlaceholderReferences module.

Please Note: not all of the push additions are working as expected when the placeholder user is reassigned so it's likely the implementation is incorrect.

The requested_reviewer_importer has not been updated as it uses a bulk_insert method which creates the records, bundles them, then uses legacy_bulk_import to save them in batches to the DB. Nesting legacy_bulk_import in this way means it's tricky to update by ids. I think this importer will need to write its own, basically identical, implementation of bulk_insert but which pushes the references by ids once each batch is saved.

  1. An update to the user finder which creates placeholder and source users, and against which QA testing can be done, but which needs improving.

Also Note: Not all specs are updated. I've been focused on getting the wires to connect and didn't want to spend time on specs before I knew that the implementation was actually working.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Related to #466355

Edited by Luke Duncalfe

Merge request reports

Loading