Skip to content

Added BackgroundMigration for ProjectStats

What does this MR do and why?

Added Background Migration to update project_statistics.container_registry_size for all relevant projects which have migrated container_repositories and container_repository_size = 0

ref: https://gitlab.com/gitlab-org/gitlab/-/issues/362243

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

How to set up and validate locally

Database Queries

  1. Query to get distinct project_ids for a batch.. Using sub_batch_size of 100
SELECT DISTINCT "container_repositories"."project_id" FROM "container_repositories" WHERE "container_repositories"."project_id" BETWEEN 3806 AND 628939 AND ("container_repositories"."created_at" >= '2022-01-23' OR "container_repositories"."migration_state" = 'import_done') AND "container_repositories"."project_id" >= 3806 AND "container_repositories"."project_id" < 192676

Before Adding Index

Query Statistics:

Time: 2.771 s
  - planning: 3.302 ms
  - execution: 2.767 s
    - I/O read: 2.735 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 28 (~224.00 KiB) from the buffer pool
  - reads: 1131 (~8.80 MiB) from the OS file cache, including disk I/O
  - dirtied: 24 (~192.00 KiB)
  - writes: 0

Query Plan:

 Unique  (cost=0.43..1672.92 rows=853 width=4) (actual time=9.052..2767.083 rows=100 loops=1)
   Buffers: shared hit=28 read=1131 dirtied=24
   I/O Timings: read=2734.903 write=0.000
   ->  Index Scan using index_container_repositories_on_project_id on public.container_repositories  (cost=0.43..1670.78 rows=855 width=4) (actual time=9.049..2765.704 rows=1059 loops=1)
         Index Cond: ((container_repositories.project_id >= 3806) AND (container_repositories.project_id <= 628939) AND (container_repositories.project_id >= 3806) AND (container_repositories.project_id < 192676))
         Filter: ((container_repositories.created_at >= '2022-01-23 00:00:00'::timestamp without time zone) OR (container_repositories.migration_state = 'import_done'::text))
         Rows Removed by Filter: 92
         Buffers: shared hit=28 read=1131 dirtied=24
         I/O Timings: read=2734.903 write=0.000

Plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10886/commands/39049

After Adding Index

exec CREATE INDEX tmp_index_migrated_container_registries ON container_repositories (project_id) WHERE migration_state = 'import_done' OR created_at >= '2022-01-23'

Query Statistics:

Time: 4.122 ms
  - planning: 2.503 ms
  - execution: 1.619 ms
    - I/O read: 0.188 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 84 (~672.00 KiB) from the buffer pool
  - reads: 5 (~40.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 18 (~144.00 KiB)
  - writes: 0

Query Plan:

 Unique  (cost=0.43..98.79 rows=853 width=4) (actual time=0.280..1.563 rows=100 loops=1)
   Buffers: shared hit=84 read=5 dirtied=18
   I/O Timings: read=0.188 write=0.000
   ->  Index Only Scan using tmp_index_migrated_container_registries on public.container_repositories  (cost=0.43..96.65 rows=855 width=4) (actual time=0.279..1.421 rows=1059 loops=1)
         Index Cond: ((container_repositories.project_id >= 3806) AND (container_repositories.project_id <= 628939) AND (container_repositories.project_id >= 3806) AND (container_repositories.project_id < 192676))
         Heap Fetches: 59
         Buffers: shared hit=84 read=5 dirtied=18
         I/O Timings: read=0.188 write=0.000

Plan: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10888/commands/39058

  1. Query for ProjectStatistics.where(project_id: sub_batch).where(container_registry_size: 0)
SELECT "project_statistics".* FROM "project_statistics" WHERE "project_statistics"."project_id" IN (SELECT DISTINCT "container_repositories"."project_id" FROM "container_repositories" WHERE "container_repositories"."project_id" BETWEEN 3806 AND 628939 AND ("container_repositories"."created_at" >= '2022-01-23' OR "container_repositories"."migration_state" = 'import_done') AND "container_repositories"."project_id" >= 3806 AND "container_repositories"."project_id" < 192676) AND "project_statistics"."container_registry_size" = 0

Before Adding Index

Query Statistics

Time: 913.440 ms
  - planning: 2.551 ms
  - execution: 910.889 ms
    - I/O read: 887.392 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 316 (~2.50 MiB) from the buffer pool
  - reads: 176 (~1.40 MiB) from the OS file cache, including disk I/O
  - dirtied: 15 (~120.00 KiB)
  - writes: 0

Query Plan:

Nested Loop  (cost=0.87..3421.05 rows=952 width=116) (actual time=17.162..910.707 rows=83 loops=1)
   Buffers: shared hit=316 read=176 dirtied=15
   I/O Timings: read=887.392 write=0.000
   ->  Unique  (cost=0.43..110.62 rows=956 width=4) (actual time=0.072..1.897 rows=100 loops=1)
         Buffers: shared hit=89
         I/O Timings: read=0.000 write=0.000
         ->  Index Only Scan using idx_container_registry_size_proj_stat on public.container_repositories  (cost=0.43..108.22 rows=958 width=4) (actual time=0.070..1.360 rows=1059 loops=1)
               Index Cond: ((container_repositories.project_id >= 3806) AND (container_repositories.project_id <= 628939) AND (container_repositories.project_id >= 3806) AND (container_repositories.project_id < 192676))
               Heap Fetches: 59
               Buffers: shared hit=89
               I/O Timings: read=0.000 write=0.000
   ->  Index Scan using index_project_statistics_on_project_id on public.project_statistics  (cost=0.44..3.45 rows=1 width=116) (actual time=9.079..9.079 rows=1 loops=100)
         Index Cond: (project_statistics.project_id = container_repositories.project_id)
         Filter: (project_statistics.container_registry_size = 0)
         Rows Removed by Filter: 0
         Buffers: shared hit=226 read=176 dirtied=14
         I/O Timings: read=887.392 write=0.000

Plan:

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10887/commands/39055

After Adding Index

exec CREATE INDEX tmp_index_project_statistics_cont_registry_size ON project_statistics (project_id) WHERE container_registry_size = 0

Query Statistics

Time: 16.369 ms
  - planning: 2.055 ms
  - execution: 14.314 ms
    - I/O read: 9.691 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 311 (~2.40 MiB) from the buffer pool
  - reads: 161 (~1.30 MiB) from the OS file cache, including disk I/O
  - dirtied: 7 (~56.00 KiB)
  - writes: 0

Query Plan:

 Nested Loop  (cost=0.87..3046.94 rows=850 width=116) (actual time=0.432..14.245 rows=83 loops=1)
   Buffers: shared hit=311 read=161 dirtied=7
   I/O Timings: read=9.691 write=0.000
   ->  Unique  (cost=0.43..98.79 rows=853 width=4) (actual time=0.053..3.577 rows=100 loops=1)
         Buffers: shared hit=89
         I/O Timings: read=0.000 write=0.000
         ->  Index Only Scan using tmp_index_migrated_container_registries on public.container_repositories  (cost=0.43..96.65 rows=855 width=4) (actual time=0.052..1.138 rows=1059 loops=1)
               Index Cond: ((container_repositories.project_id >= 3806) AND (container_repositories.project_id <= 628939) AND (container_repositories.project_id >= 3806) AND (container_repositories.project_id < 192676))
               Heap Fetches: 59
               Buffers: shared hit=89
               I/O Timings: read=0.000 write=0.000
   ->  Index Scan using tmp_index_project_statistics_cont_registry_size on public.project_statistics  (cost=0.44..3.45 rows=1 width=116) (actual time=0.105..0.105 rows=1 loops=100)
         Index Cond: (project_statistics.project_id = container_repositories.project_id)
         Buffers: shared hit=222 read=161 dirtied=7
         I/O Timings: read=9.691 write=0.000

Plan:

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10888/commands/39059

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Suraj Tripathi

Merge request reports

Loading