Skip to content

Add service to merge project exported relations

Rodrigo Tomonari requested to merge rodrigo/combine-relation-exports into master

What does this MR do and why?

Related to: #360685 (closed)

This change is part of a sequence of changes that will export projects using multiple Sidekiq Jobs rather than running the whole export process in just one single job.

In the previous MR, a worker and service were introduced to export each project relation (issues, labels, releases, uploads, etc) in a separate Sidekiq Job. The service exports each relation file into a folder with the same structure as the final project export file. For example:

A project export file has the following structure:

├── GITLAB_REVISION
├── GITLAB_VERSION
├── VERSION
├── lfs-objects
│   ├── 47997ea7ecff33be61e3ca1cc287ee72a2125161518f1a169f2893a5a82e9d95
│   ├── 8c1e8de917525f83104736f6c64d32f0e2a02f5bf2ee57843a54f222cba8c813
│   ├── 96f74c6fe7a2979eefb9ec74a5dfc6888fb25543cf99b77586b79afea1da6f97
│   ├── 9c5cd2d86e8cc6559da4e265c1c2647fc7a25945f47aed47e21f71ef753edbef
│   ├── a9416306b78ba699d4aba4c10dd1dd82d4fc821e54fc4843245a40c2b5040120
│   ├── bad71f905b60729f502ca339f7c9f001281a3d12c68a5da7f15de8009f4bd63d
│   └── f2b0a1e7550e9b718dafc9b525a04879a766de62e4fbdfc46593d47f7ab74636
├── lfs-objects.json
├── project.bundle
├── snippets
├── tree
│   ├── project
│   │   ├── auto_devops.ndjson
│   │   ├── boards.ndjson
│   │   ├── ci_cd_settings.ndjson
│   │   ├── ci_pipelines.ndjson
│   │   ├── container_expiration_policy.ndjson
│   │   ├── custom_attributes.ndjson
│   │   ├── error_tracking_setting.ndjson
│   │   ├── external_pull_requests.ndjson
│   │   ├── issues.ndjson
│   │   ├── labels.ndjson
│   │   ├── merge_requests.ndjson
│   │   ├── metrics_setting.ndjson
│   │   ├── milestones.ndjson
│   │   ├── pipeline_schedules.ndjson
│   │   ├── project_badges.ndjson
│   │   ├── project_feature.ndjson
│   │   ├── project_members.ndjson
│   │   ├── prometheus_metrics.ndjson
│   │   ├── protected_branches.ndjson
│   │   ├── protected_environments.ndjson
│   │   ├── protected_tags.ndjson
│   │   ├── push_rule.ndjson
│   │   ├── releases.ndjson
│   │   ├── security_setting.ndjson
│   │   ├── service_desk_setting.ndjson
│   │   └── snippets.ndjson
│   └── project.json
└── uploads
    ├── 04ef49f61ffc5404e8d9d34c0aa44a82
    │   └── file1.txt
    ├── 05dc1217de5b310510fe7794c778f22e
    │   └── file2.txt

But issues relation exported separately is exported using the folder structure:

├── tree
│   ├── project
│   │   ├── issues.ndjson

And uploads relation is exported using the folder structure:

│── uploads
    ├── 04ef49f61ffc5404e8d9d34c0aa44a82
    │   └── file1.txt
    ├── 05dc1217de5b310510fe7794c778f22e
    │   └── file2.txt

Since each relation is exported using the same structure as the final export file, the previously exported relations need to be merged in the same folder to generate the final export file.

And this change introduces the code that performs the download of the previously generated relations and merges the content of them in a target folder, which eventually will be compressed into a tar.gz file similar to the current approach to export projects.

Note: GITLAB_REVISION, GITLAB_VERSION and VERSION files aren't project's relation exports, which means they aren't generated in this step and will need to be added later to the final project export file

Screenshots or screen recordings

merge-folders

How to set up and validate locally

Because this is a work in progress, the Rails console needs to be used to test the change.

Open the Rails console and execute the commands below. The commands will enqueue several Sidekiq Jobs, and each will export a relation of the project and upload/copy the file to object store.

In the table project_export_jobs and project_relation_export_uploads it's possible to check when the export completes.

project = Project.first # pick a project to export

project_export_job = project.export_jobs.create(status: 0, jid: SecureRandom.hex(10))

Projects::ImportExport::RelationExport.relation_names_list.each do |relation|
  relation_export = project_export_job.relation_exports.create(relation: relation)
  Projects::ImportExport::RelationExportWorker.perform_async(relation_export.id)
end

When all relations are exported, use the command below to test the class which will download the exported files, extract, and merge the content in a new folder (this MR).

The path to the new folder will be outputted as part of the command, and it can be used to check the content of the folder, which should be equal to the content of a project-exported file.

shared = project.import_export_shared
saver = Gitlab::ImportExport::Project::ExportedRelationsMerger.new(export_job: project_export_job, shared: shared)
saver.save
project_export_job.finish

puts shared.export_path

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Rodrigo Tomonari

Merge request reports

Loading