Added BackgroundMigration for ProjectStats
What does this MR do and why?
Added Background Migration to update project_statistics.container_registry_size
for all relevant projects
which have migrated container_repositories and container_repository_size = 0
ref: https://gitlab.com/gitlab-org/gitlab/-/issues/362243
Screenshots or screen recordings
These are strongly recommended to assist reviewers and reduce the time to merge your change.
How to set up and validate locally
Database Queries
- Query to get distinct project_ids for a batch.. Using sub_batch_size of 100
SELECT DISTINCT "container_repositories"."project_id" FROM "container_repositories" WHERE "container_repositories"."project_id" BETWEEN 3806 AND 628939 AND ("container_repositories"."created_at" >= '2022-01-23' OR "container_repositories"."migration_state" = 'import_done') AND "container_repositories"."project_id" >= 3806 AND "container_repositories"."project_id" < 192676
Before Adding Index
Query Statistics:
Time: 2.771 s
- planning: 3.302 ms
- execution: 2.767 s
- I/O read: 2.735 s
- I/O write: 0.000 ms
Shared buffers:
- hits: 28 (~224.00 KiB) from the buffer pool
- reads: 1131 (~8.80 MiB) from the OS file cache, including disk I/O
- dirtied: 24 (~192.00 KiB)
- writes: 0
Query Plan:
Unique (cost=0.43..1672.92 rows=853 width=4) (actual time=9.052..2767.083 rows=100 loops=1)
Buffers: shared hit=28 read=1131 dirtied=24
I/O Timings: read=2734.903 write=0.000
-> Index Scan using index_container_repositories_on_project_id on public.container_repositories (cost=0.43..1670.78 rows=855 width=4) (actual time=9.049..2765.704 rows=1059 loops=1)
Index Cond: ((container_repositories.project_id >= 3806) AND (container_repositories.project_id <= 628939) AND (container_repositories.project_id >= 3806) AND (container_repositories.project_id < 192676))
Filter: ((container_repositories.created_at >= '2022-01-23 00:00:00'::timestamp without time zone) OR (container_repositories.migration_state = 'import_done'::text))
Rows Removed by Filter: 92
Buffers: shared hit=28 read=1131 dirtied=24
I/O Timings: read=2734.903 write=0.000
Plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10886/commands/39049
After Adding Index
exec CREATE INDEX tmp_index_migrated_container_registries ON container_repositories (project_id) WHERE migration_state = 'import_done' OR created_at >= '2022-01-23'
Query Statistics:
Time: 4.122 ms
- planning: 2.503 ms
- execution: 1.619 ms
- I/O read: 0.188 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 84 (~672.00 KiB) from the buffer pool
- reads: 5 (~40.00 KiB) from the OS file cache, including disk I/O
- dirtied: 18 (~144.00 KiB)
- writes: 0
Query Plan:
Unique (cost=0.43..98.79 rows=853 width=4) (actual time=0.280..1.563 rows=100 loops=1)
Buffers: shared hit=84 read=5 dirtied=18
I/O Timings: read=0.188 write=0.000
-> Index Only Scan using tmp_index_migrated_container_registries on public.container_repositories (cost=0.43..96.65 rows=855 width=4) (actual time=0.279..1.421 rows=1059 loops=1)
Index Cond: ((container_repositories.project_id >= 3806) AND (container_repositories.project_id <= 628939) AND (container_repositories.project_id >= 3806) AND (container_repositories.project_id < 192676))
Heap Fetches: 59
Buffers: shared hit=84 read=5 dirtied=18
I/O Timings: read=0.188 write=0.000
Plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10888/commands/39058
- Query for
ProjectStatistics.where(project_id: sub_batch).where(container_registry_size: 0)
SELECT "project_statistics".* FROM "project_statistics" WHERE "project_statistics"."project_id" IN (SELECT DISTINCT "container_repositories"."project_id" FROM "container_repositories" WHERE "container_repositories"."project_id" BETWEEN 3806 AND 628939 AND ("container_repositories"."created_at" >= '2022-01-23' OR "container_repositories"."migration_state" = 'import_done') AND "container_repositories"."project_id" >= 3806 AND "container_repositories"."project_id" < 192676) AND "project_statistics"."container_registry_size" = 0
Before Adding Index
Query Statistics
Time: 913.440 ms
- planning: 2.551 ms
- execution: 910.889 ms
- I/O read: 887.392 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 316 (~2.50 MiB) from the buffer pool
- reads: 176 (~1.40 MiB) from the OS file cache, including disk I/O
- dirtied: 15 (~120.00 KiB)
- writes: 0
Query Plan:
Nested Loop (cost=0.87..3421.05 rows=952 width=116) (actual time=17.162..910.707 rows=83 loops=1)
Buffers: shared hit=316 read=176 dirtied=15
I/O Timings: read=887.392 write=0.000
-> Unique (cost=0.43..110.62 rows=956 width=4) (actual time=0.072..1.897 rows=100 loops=1)
Buffers: shared hit=89
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using idx_container_registry_size_proj_stat on public.container_repositories (cost=0.43..108.22 rows=958 width=4) (actual time=0.070..1.360 rows=1059 loops=1)
Index Cond: ((container_repositories.project_id >= 3806) AND (container_repositories.project_id <= 628939) AND (container_repositories.project_id >= 3806) AND (container_repositories.project_id < 192676))
Heap Fetches: 59
Buffers: shared hit=89
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_project_statistics_on_project_id on public.project_statistics (cost=0.44..3.45 rows=1 width=116) (actual time=9.079..9.079 rows=1 loops=100)
Index Cond: (project_statistics.project_id = container_repositories.project_id)
Filter: (project_statistics.container_registry_size = 0)
Rows Removed by Filter: 0
Buffers: shared hit=226 read=176 dirtied=14
I/O Timings: read=887.392 write=0.000
Plan:
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10887/commands/39055
After Adding Index
exec CREATE INDEX tmp_index_project_statistics_cont_registry_size ON project_statistics (project_id) WHERE container_registry_size = 0
Query Statistics
Time: 16.369 ms
- planning: 2.055 ms
- execution: 14.314 ms
- I/O read: 9.691 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 311 (~2.40 MiB) from the buffer pool
- reads: 161 (~1.30 MiB) from the OS file cache, including disk I/O
- dirtied: 7 (~56.00 KiB)
- writes: 0
Query Plan:
Nested Loop (cost=0.87..3046.94 rows=850 width=116) (actual time=0.432..14.245 rows=83 loops=1)
Buffers: shared hit=311 read=161 dirtied=7
I/O Timings: read=9.691 write=0.000
-> Unique (cost=0.43..98.79 rows=853 width=4) (actual time=0.053..3.577 rows=100 loops=1)
Buffers: shared hit=89
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using tmp_index_migrated_container_registries on public.container_repositories (cost=0.43..96.65 rows=855 width=4) (actual time=0.052..1.138 rows=1059 loops=1)
Index Cond: ((container_repositories.project_id >= 3806) AND (container_repositories.project_id <= 628939) AND (container_repositories.project_id >= 3806) AND (container_repositories.project_id < 192676))
Heap Fetches: 59
Buffers: shared hit=89
I/O Timings: read=0.000 write=0.000
-> Index Scan using tmp_index_project_statistics_cont_registry_size on public.project_statistics (cost=0.44..3.45 rows=1 width=116) (actual time=0.105..0.105 rows=1 loops=100)
Index Cond: (project_statistics.project_id = container_repositories.project_id)
Buffers: shared hit=222 read=161 dirtied=7
I/O Timings: read=9.691 write=0.000
Plan:
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10888/commands/39059
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.