Enable removing import data on failure by default (!81404) · Merge requests · GitLab.org / GitLab

Igor Drozdov requested to merge id-enable-removing-import-data-on-failure into master Feb 23, 2022

What does this MR do and why?

When import is failed, there is no need in collecting their import data. This MR enables the functionality by default, it has been enabled globally for a while now.

Project import data for the failed imports are being removed. This is the query and the results of getting last week data:

EXPLAIN SELECT "project_mirror_data".* FROM "project_mirror_data" INNER JOIN
"projects" "project" ON "project"."id" = "project_mirror_data"."project_id"
INNER JOIN "project_import_data" ON "project_import_data"."project_id" =
"project"."id" WHERE "project_mirror_data"."status" = 'failed' AND
"project_mirror_data"."last_update_scheduled_at" > '2022-02-15 17:52:50.670114'
AND "project"."mirror" = false

https://files.slack.com/files-pri/T02592416-F0349TSCXV1/plan-text.txt

 Gather  (cost=1001.54..88293.81 rows=154 width=269) (actual time=20069.690..20069.938 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=25045 read=62580 dirtied=4163
   I/O Timings: read=59219.539 write=0.000
   ->  Nested Loop  (cost=1.54..87278.41 rows=64 width=269) (actual time=20023.972..20023.975 rows=0 loops=3)
         Buffers: shared hit=25045 read=62580 dirtied=4163
         I/O Timings: read=59219.539 write=0.000
         ->  Nested Loop  (cost=0.98..86712.37 rows=284 width=273) (actual time=11369.826..18766.191 rows=1062 loops=3)
               Buffers: shared hit=12867 read=58819 dirtied=4021
               I/O Timings: read=55517.442 write=0.000
               ->  Parallel Index Scan using index_project_mirror_data_on_status on public.project_mirror_data  (cost=0.56..83243.36 rows=2384 width=269) (actual time=4786.758..18529.918 rows=1235 loops=3)
                     Index Cond: ((project_mirror_data.status)::text = 'failed'::text)
                     Filter: (project_mirror_data.last_update_scheduled_at > '2022-02-15 17:52:50.670114'::timestamp without time zone)
                     Rows Removed by Filter: 68355
                     Buffers: shared hit=452 read=58054 dirtied=3924
                     I/O Timings: read=54873.931 write=0.000
               ->  Index Only Scan using index_project_import_data_on_project_id on public.project_import_data  (cost=0.42..1.45 rows=1 width=4) (actual time=0.187..0.188 rows=1 loops=3704)
                     Index Cond: (project_import_data.project_id = project_mirror_data.project_id)
                     Heap Fetches: 2050
                     Buffers: shared hit=12415 read=765 dirtied=97
                     I/O Timings: read=643.511 write=0.000
         ->  Index Scan using projects_pkey on public.projects project  (cost=0.56..1.98 rows=1 width=4) (actual time=1.183..1.183 rows=0 loops=3186)
               Index Cond: (project.id = project_import_data.project_id)
               Filter: (NOT project.mirror)
               Rows Removed by Filter: 1
               Buffers: shared hit=12178 read=3761 dirtied=142
               I/O Timings: read=3702.098 write=0.000

Edited Feb 23, 2022 by Igor Drozdov

Enable removing import data on failure by default

What does this MR do and why?

Merge request reports