Prune old RelationExport and RelationExportUpload records
What does this MR do and why?
This MR prunes expired ProjectExportUpload
s, including upload files. Previously, just ProjectExportUpload
s records were pruned along with Projects::ImportExport::RelationExport
s and Projects::ImportExport::RelationExportUpload
s via ON DELETE CASCADE
. However, export files remained. This MR addresses those remaining files.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
No visible changes
Queries
ProjectExportJob.prunable
delete_all
1. This query was covered in the original MR to create this job, but the query is now executed in the new PruneExpiredExportJobsService
instead of existing in a model method. Here's the query explanation anyway just so it's fresh:
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/26038/commands/81976
Generates SQL queries like this (IDs not real) from .each_batch
SELECT "project_export_jobs".* FROM "project_export_jobs" WHERE (updated_at < '2024-01-30 20:16:30.838492') AND "project_export_jobs"."id" >= 2592430 AND "project_export_jobs"."id" < 3693430;
Explanation:
Index Scan using project_export_jobs_pkey on public.project_export_jobs (cost=0.42..3.44 rows=1 width=59) (actual time=0.012..0.013 rows=0 loops=1)
Index Cond: ((project_export_jobs.id >= 2592430) AND (project_export_jobs.id < 3693430))
Filter: (project_export_jobs.updated_at < '2024-01-30 20:16:30.838492+00'::timestamp with time zone)
Rows Removed by Filter: 0
Buffers: shared hit=6
I/O Timings: read=0.000 write=0.000
prunable_uploads
delete_all
2. https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/26038/commands/81996
Generated SQL (without each_batch
):
SELECT "uploads".* FROM "uploads" WHERE "uploads"."model_type" = 'Projects::ImportExport::RelationExportUpload' AND "uploads"."model_id" IN (SELECT "project_relation_export_uploads"."id" FROM "project_relation_export_uploads" INNER JOIN "project_relation_exports" "relation_export" ON "relation_export"."id" = "project_relation_export_uploads"."project_relation_export_id" WHERE "relation_export"."project_export_job_id" IN (SELECT "project_export_jobs"."id" FROM "project_export_jobs" WHERE (updated_at < '2024-01-30 21:18:11.186764')));
Explanation:
Nested Loop (cost=4.02..7.06 rows=1 width=281) (actual time=0.008..0.010 rows=0 loops=1)
I/O Timings: read=0.000 write=0.000
-> HashAggregate (cost=3.46..3.47 rows=1 width=8) (actual time=0.008..0.009 rows=0 loops=1)
Group Key: project_relation_export_uploads.id
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=0.42..3.45 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=1)
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=0.00..0.01 rows=1 width=16) (actual time=0.006..0.007 rows=0 loops=1)
I/O Timings: read=0.000 write=0.000
-> Seq Scan on public.project_relation_export_uploads (cost=0.00..0.00 rows=1 width=16) (actual time=0.006..0.006 rows=0 loops=1)
I/O Timings: read=0.000 write=0.000
-> Seq Scan on public.project_relation_exports relation_export (cost=0.00..0.00 rows=1 width=16) (actual time=0.000..0.000 rows=0 loops=0)
I/O Timings: read=0.000 write=0.000
-> Index Scan using project_export_jobs_pkey on public.project_export_jobs (cost=0.42..3.44 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (project_export_jobs.id = relation_export.project_export_job_id)
Filter: (project_export_jobs.updated_at < '2024-01-30 21:18:11.186764+00'::timestamp with time zone)
Rows Removed by Filter: 0
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_uploads_on_model_id_and_model_type on public.uploads (cost=0.56..3.58 rows=1 width=281) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: ((uploads.model_id = project_relation_export_uploads.id) AND ((uploads.model_type)::text = 'Projects::ImportExport::RelationExportUpload'::text))
I/O Timings: read=0.000 write=0.000
How to set up and validate locally
-
Make it so that the current time is at more than a week in the past and start your local environment. I find it easiest to manually set your system's time
-
Trigger a parallel project export in the Rails console and let it complete. Do this multiple times with multiple projects if you'd like:
project_id = # ID of Project you want to export user_id = # ID of the user that performs the export Projects::ImportExport::CreateRelationExportsWorker.perform_async(user_id, project_id)
-
Verify the export completed, and there's a
ProjectExportJob
with associatedRelationExport
s andRelationExportUpload
s -
Reset your local environment to use the actual time and date. If you'd like, run a fresh parallel project export.
-
Run the
PruneProjectExportJobsWorker
:Gitlab::Export::PruneProjectExportJobsWorker.perform_async
-
Verify that the old
ProjectExportJob
s and their associations were pruned, and any freshProjectExportJob
s were not pruned
Related to #382885 (closed)