Add service to merge project exported relations
What does this MR do and why?
Related to: #360685 (closed)
This change is part of a sequence of changes that will export projects using multiple Sidekiq Jobs rather than running the whole export process in just one single job.
In the previous MR, a worker and service were introduced to export each project relation (issues, labels, releases, uploads, etc) in a separate Sidekiq Job. The service exports each relation file into a folder with the same structure as the final project export file. For example:
A project export file has the following structure:
├── GITLAB_REVISION
├── GITLAB_VERSION
├── VERSION
├── lfs-objects
│ ├── 47997ea7ecff33be61e3ca1cc287ee72a2125161518f1a169f2893a5a82e9d95
│ ├── 8c1e8de917525f83104736f6c64d32f0e2a02f5bf2ee57843a54f222cba8c813
│ ├── 96f74c6fe7a2979eefb9ec74a5dfc6888fb25543cf99b77586b79afea1da6f97
│ ├── 9c5cd2d86e8cc6559da4e265c1c2647fc7a25945f47aed47e21f71ef753edbef
│ ├── a9416306b78ba699d4aba4c10dd1dd82d4fc821e54fc4843245a40c2b5040120
│ ├── bad71f905b60729f502ca339f7c9f001281a3d12c68a5da7f15de8009f4bd63d
│ └── f2b0a1e7550e9b718dafc9b525a04879a766de62e4fbdfc46593d47f7ab74636
├── lfs-objects.json
├── project.bundle
├── snippets
├── tree
│ ├── project
│ │ ├── auto_devops.ndjson
│ │ ├── boards.ndjson
│ │ ├── ci_cd_settings.ndjson
│ │ ├── ci_pipelines.ndjson
│ │ ├── container_expiration_policy.ndjson
│ │ ├── custom_attributes.ndjson
│ │ ├── error_tracking_setting.ndjson
│ │ ├── external_pull_requests.ndjson
│ │ ├── issues.ndjson
│ │ ├── labels.ndjson
│ │ ├── merge_requests.ndjson
│ │ ├── metrics_setting.ndjson
│ │ ├── milestones.ndjson
│ │ ├── pipeline_schedules.ndjson
│ │ ├── project_badges.ndjson
│ │ ├── project_feature.ndjson
│ │ ├── project_members.ndjson
│ │ ├── prometheus_metrics.ndjson
│ │ ├── protected_branches.ndjson
│ │ ├── protected_environments.ndjson
│ │ ├── protected_tags.ndjson
│ │ ├── push_rule.ndjson
│ │ ├── releases.ndjson
│ │ ├── security_setting.ndjson
│ │ ├── service_desk_setting.ndjson
│ │ └── snippets.ndjson
│ └── project.json
└── uploads
├── 04ef49f61ffc5404e8d9d34c0aa44a82
│ └── file1.txt
├── 05dc1217de5b310510fe7794c778f22e
│ └── file2.txt
But issues
relation exported separately is exported using the folder structure:
├── tree
│ ├── project
│ │ ├── issues.ndjson
And uploads
relation is exported using the folder structure:
│── uploads
├── 04ef49f61ffc5404e8d9d34c0aa44a82
│ └── file1.txt
├── 05dc1217de5b310510fe7794c778f22e
│ └── file2.txt
Since each relation is exported using the same structure as the final export file, the previously exported relations need to be merged in the same folder to generate the final export file.
And this change introduces the code that performs the download of the previously generated relations and merges the content of them in a target folder, which eventually will be compressed into a tar.gz file similar to the current approach to export projects.
Note: GITLAB_REVISION
, GITLAB_VERSION
and VERSION
files aren't project's relation exports, which means they aren't generated in this step and will need to be added later to the final project export file
Screenshots or screen recordings
How to set up and validate locally
Because this is a work in progress, the Rails console needs to be used to test the change.
Open the Rails console and execute the commands below. The commands will enqueue several Sidekiq Jobs, and each will export a relation of the project and upload/copy the file to object store.
In the table project_export_jobs
and project_relation_export_uploads
it's possible to check when the export completes.
project = Project.first # pick a project to export
project_export_job = project.export_jobs.create(status: 0, jid: SecureRandom.hex(10))
Projects::ImportExport::RelationExport.relation_names_list.each do |relation|
relation_export = project_export_job.relation_exports.create(relation: relation)
Projects::ImportExport::RelationExportWorker.perform_async(relation_export.id)
end
When all relations are exported, use the command below to test the class which will download the exported files, extract, and merge the content in a new folder (this MR).
The path to the new folder will be outputted as part of the command, and it can be used to check the content of the folder, which should be equal to the content of a project-exported file.
shared = project.import_export_shared
saver = Gitlab::ImportExport::Project::ExportedRelationsMerger.new(export_job: project_export_job, shared: shared)
saver.save
project_export_job.finish
puts shared.export_path
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.