Prepare excess storage data in project and namespace
What does this MR do?
Part of #255348
Depends on !43322 (merged)
Introduce and adjust methods in the project and namespace models related to excess repository storage data. These methods will be used in another merge request to pass data to the storage endpoint response.
- Include
lfs_object_size
in calculations for total repository size of a project.- This means a slight adjustment to the method/query introduced as part of !43322 (merged),
total_repository_size_excess
. Previously, we only consideredproject_statistics.repository_size
but now we will use the sum ofproject_statistics.repository_size
andproject_statistics.lfs_object_size
.
- This means a slight adjustment to the method/query introduced as part of !43322 (merged),
- Introduce new methods/queries below, some based on logic from above query:
-
repository_size_excess_project_count
: Number of projects wheretotal_repository_size
exceeds the limit. -
total_repository_size
: Sum total of all projects'total_repository_size
in the namespace. -
contains_locked_projects
: Boolean for whether the total repository excess exceeds the additional purchased storage size.
-
Database Review Notes
total_repository_size_excess
You can see the details of the query plan before these changes in !43322 (merged).
First query for projects with a set repository limit:
Formatted Query:
EXPLAIN SELECT
SUM((("project_statistics"."repository_size" + "project_statistics"."lfs_objects_size") - "projects"."repository_size_limit"))
FROM
"projects"
INNER JOIN routes rs ON rs.source_id = projects.id
AND rs.source_type = 'Project'
INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id"
WHERE (rs.path LIKE 'gbobject/%')
AND "projects"."repository_size_limit" != 0 AND ((project_statistics.repository_size + project_statistics.lfs_objects_size) > projects.repository_size_limit)
https://explain.depesz.com/s/sP8s):
Explain Output (Aggregate (cost=130846.63..130846.64 rows=1 width=32) (actual time=2797.824..2797.824 rows=1 loops=1)
Buffers: shared hit=154269 read=44687 dirtied=1734
I/O Timings: read=2402.392
-> Nested Loop (cost=1.43..130846.62 rows=1 width=24) (actual time=2797.817..2797.817 rows=0 loops=1)
Buffers: shared hit=154269 read=44687 dirtied=1734
I/O Timings: read=2402.392
-> Nested Loop (cost=1.00..130837.10 rows=3 width=16) (actual time=2797.816..2797.817 rows=0 loops=1)
Buffers: shared hit=154269 read=44687 dirtied=1734
I/O Timings: read=2402.392
-> Index Scan using index_routes_on_path_text_pattern_ops on public.routes rs (cost=0.56..36023.61 rows=31167 width=4) (actual time=6.516..638.718 rows=39840 loops=1)
Index Cond: (((rs.path)::text ~>=~ 'gbobject/'::text) AND ((rs.path)::text ~<~ 'gbobject0'::text))
Filter: (((rs.path)::text ~~ 'gbobject/%'::text) AND ((rs.source_type)::text = 'Project'::text))
Rows Removed by Filter: 0
Buffers: shared hit=30270 read=9127 dirtied=270
I/O Timings: read=536.476
-> Index Scan using projects_pkey on public.projects (cost=0.43..3.04 rows=1 width=12) (actual time=0.053..0.053 rows=0 loops=39840)
Index Cond: (projects.id = rs.source_id)
Filter: (projects.repository_size_limit <> 0)
Rows Removed by Filter: 1
Buffers: shared hit=123999 read=35560 dirtied=1464
I/O Timings: read=1865.916
-> Index Scan using index_project_statistics_on_project_id on public.project_statistics (cost=0.43..3.16 rows=1 width=20) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (project_statistics.project_id = rs.source_id)
Summary:
Cold cache:
Time: 2.799 s
- planning: 1.460 ms
- execution: 2.798 s
- I/O read: 2.402 s
- I/O write: 0.000 ms
Shared buffers:
- hits: 154269 (~1.20 GiB) from the buffer pool
- reads: 44687 (~349.10 MiB) from the OS file cache, including disk I/O
- dirtied: 1734 (~13.50 MiB)
- writes: 0
Warm cache:
Time: 230.490 ms
- planning: 2.094 ms
- execution: 228.396 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 198956 (~1.50 GiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Second query for projects that fall back to the namespace repository limit (example with 10 GB):
Formatted Query:
EXPLAIN SELECT
SUM((("project_statistics"."repository_size" + "project_statistics"."lfs_objects_size") - 10485760000))
FROM
"projects"
INNER JOIN routes rs ON rs.source_id = projects.id
AND rs.source_type = 'Project'
INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id"
WHERE (rs.path LIKE 'gbobject/%')
AND "projects"."repository_size_limit" IS NULL AND ((project_statistics.repository_size + project_statistics.lfs_objects_size) > '10485760000')
https://explain.depesz.com/s/1hW2):
Explain Output (Aggregate (cost=141081.05..141081.06 rows=1 width=32) (actual time=1237.196..1237.196 rows=1 loops=1)
Buffers: shared hit=166505 read=32550 dirtied=1017
I/O Timings: read=936.221
-> Nested Loop (cost=1.43..141006.09 rows=9994 width=16) (actual time=1237.190..1237.190 rows=0 loops=1)
Buffers: shared hit=166505 read=32550 dirtied=1017
I/O Timings: read=936.221
-> Nested Loop (cost=1.00..134575.02 rows=9995 width=24) (actual time=1237.190..1237.190 rows=0 loops=1)
Buffers: shared hit=166505 read=32550 dirtied=1017
I/O Timings: read=936.221
-> Index Scan using index_routes_on_path_text_pattern_ops on public.routes rs (cost=0.56..36023.61 rows=31167 width=4) (actual time=0.743..330.281 rows=39840 loops=1)
Index Cond: (((rs.path)::text ~>=~ 'gbobject/'::text) AND ((rs.path)::text ~<~ 'gbobject0'::text))
Filter: (((rs.path)::text ~~ 'gbobject/%'::text) AND ((rs.source_type)::text = 'Project'::text))
Rows Removed by Filter: 0
Buffers: shared hit=30270 read=9127 dirtied=270
I/O Timings: read=252.951
-> Index Scan using index_project_statistics_on_project_id on public.project_statistics (cost=0.43..3.16 rows=1 width=20) (actual time=0.022..0.022 rows=0 loops=39840)
Index Cond: (project_statistics.project_id = rs.source_id)
Filter: ((project_statistics.repository_size + project_statistics.lfs_objects_size) > '10485760000'::bigint)
Rows Removed by Filter: 1
Buffers: shared hit=136235 read=23423 dirtied=747
I/O Timings: read=683.270
-> Index Scan using projects_pkey on public.projects (cost=0.43..0.63 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (projects.id = project_statistics.project_id)
Filter: (projects.repository_size_limit IS NULL)
Rows Removed by Filter: 0
Summary:
Cold cache:
Time: 1.239 s
- planning: 1.371 ms
- execution: 1.237 s
- I/O read: 936.221 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 166505 (~1.30 GiB) from the buffer pool
- reads: 32550 (~254.30 MiB) from the OS file cache, including disk I/O
- dirtied: 1017 (~7.90 MiB)
- writes: 0
Warm cache:
Time: 176.866 ms
- planning: 1.347 ms
- execution: 175.519 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 199055 (~1.50 GiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
repository_size_excess_project_count
First query for projects with a set repository limit:
Formatted Query:
EXPLAIN SELECT
COUNT(*)
FROM
"projects"
INNER JOIN routes rs ON rs.source_id = projects.id
AND rs.source_type = 'Project'
INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id"
WHERE (rs.path LIKE 'gbobject/%')
AND "projects"."repository_size_limit" != 0 AND ((project_statistics.repository_size + project_statistics.lfs_objects_size) > projects.repository_size_limit)
https://explain.depesz.com/s/kk5T):
Explain Output (Aggregate (cost=130846.62..130846.63 rows=1 width=8) (actual time=35115.591..35115.592 rows=1 loops=1)
Buffers: shared hit=154269 read=44687 dirtied=1734
I/O Timings: read=34285.585
-> Nested Loop (cost=1.43..130846.62 rows=1 width=0) (actual time=35115.582..35115.582 rows=0 loops=1)
Buffers: shared hit=154269 read=44687 dirtied=1734
I/O Timings: read=34285.585
-> Nested Loop (cost=1.00..130837.10 rows=3 width=16) (actual time=35115.580..35115.581 rows=0 loops=1)
Buffers: shared hit=154269 read=44687 dirtied=1734
I/O Timings: read=34285.585
-> Index Scan using index_routes_on_path_text_pattern_ops on public.routes rs (cost=0.56..36023.61 rows=31167 width=4) (actual time=0.696..1036.064 rows=39840 loops=1)
Index Cond: (((rs.path)::text ~>=~ 'gbobject/'::text) AND ((rs.path)::text ~<~ 'gbobject0'::text))
Filter: (((rs.path)::text ~~ 'gbobject/%'::text) AND ((rs.source_type)::text = 'Project'::text))
Rows Removed by Filter: 0
Buffers: shared hit=30270 read=9127 dirtied=270
I/O Timings: read=831.757
-> Index Scan using projects_pkey on public.projects (cost=0.43..3.04 rows=1 width=12) (actual time=0.853..0.853 rows=0 loops=39840)
Index Cond: (projects.id = rs.source_id)
Filter: (projects.repository_size_limit <> 0)
Rows Removed by Filter: 1
Buffers: shared hit=123999 read=35560 dirtied=1464
I/O Timings: read=33453.829
-> Index Scan using index_project_statistics_on_project_id on public.project_statistics (cost=0.43..3.16 rows=1 width=20) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (project_statistics.project_id = rs.source_id)
Summary:
Cold cache:
Time: 1.561 s
- planning: 1.337 ms
- execution: 1.559 s
- I/O read: 1.223 s
- I/O write: 0.000 ms
Shared buffers:
- hits: 154269 (~1.20 GiB) from the buffer pool
- reads: 44687 (~349.10 MiB) from the OS file cache, including disk I/O
- dirtied: 1734 (~13.50 MiB)
- writes: 0
Warm cache:
Time: 213.707 ms
- planning: 1.246 ms
- execution: 212.461 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 198956 (~1.50 GiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Second query for projects that fall back to the namespace repository limit (example with 10 GB):
Formatted Query:
EXPLAIN SELECT
COUNT(*)
FROM
"projects"
INNER JOIN routes rs ON rs.source_id = projects.id
AND rs.source_type = 'Project'
INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id"
WHERE (rs.path LIKE 'gbobject/%')
AND "projects"."repository_size_limit" IS NULL AND ((project_statistics.repository_size + project_statistics.lfs_objects_size) > 10485760000)
https://explain.depesz.com/s/WUES):
Explain Output (Aggregate (cost=141031.08..141031.09 rows=1 width=8) (actual time=1038.685..1038.685 rows=1 loops=1)
Buffers: shared hit=166505 read=32550 dirtied=1017
I/O Timings: read=805.359
-> Nested Loop (cost=1.43..141006.09 rows=9994 width=0) (actual time=1038.680..1038.680 rows=0 loops=1)
Buffers: shared hit=166505 read=32550 dirtied=1017
I/O Timings: read=805.359
-> Nested Loop (cost=1.00..134575.02 rows=9995 width=8) (actual time=1038.679..1038.680 rows=0 loops=1)
Buffers: shared hit=166505 read=32550 dirtied=1017
I/O Timings: read=805.359
-> Index Scan using index_routes_on_path_text_pattern_ops on public.routes rs (cost=0.56..36023.61 rows=31167 width=4) (actual time=0.613..287.664 rows=39840 loops=1)
Index Cond: (((rs.path)::text ~>=~ 'gbobject/'::text) AND ((rs.path)::text ~<~ 'gbobject0'::text))
Filter: (((rs.path)::text ~~ 'gbobject/%'::text) AND ((rs.source_type)::text = 'Project'::text))
Rows Removed by Filter: 0
Buffers: shared hit=30270 read=9127 dirtied=270
I/O Timings: read=226.977
-> Index Scan using index_project_statistics_on_project_id on public.project_statistics (cost=0.43..3.16 rows=1 width=4) (actual time=0.018..0.018 rows=0 loops=39840)
Index Cond: (project_statistics.project_id = rs.source_id)
Filter: ((project_statistics.repository_size + project_statistics.lfs_objects_size) > '10485760000'::bigint)
Rows Removed by Filter: 1
Buffers: shared hit=136235 read=23423 dirtied=747
I/O Timings: read=578.381
-> Index Scan using projects_pkey on public.projects (cost=0.43..0.63 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (projects.id = project_statistics.project_id)
Filter: (projects.repository_size_limit IS NULL)
Rows Removed by Filter: 0
Summary:
Cold cache:
Time: 1.040 s
- planning: 1.063 ms
- execution: 1.039 s
- I/O read: 805.359 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 166505 (~1.30 GiB) from the buffer pool
- reads: 32550 (~254.30 MiB) from the OS file cache, including disk I/O
- dirtied: 1017 (~7.90 MiB)
- writes: 0
Warm cache:
Time: 312.032 ms
- planning: 1.278 ms
- execution: 310.754 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 345329 (~2.60 GiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
total_repository_size
Formatted Query:
EXPLAIN SELECT
SUM(("project_statistics"."repository_size" + "project_statistics"."lfs_objects_size"))
FROM "projects"
INNER JOIN routes rs ON rs.source_id = projects.id
AND rs.source_type = 'Project'
INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id"
WHERE (rs.path LIKE 'gbobject/%') LIMIT 1
https://explain.depesz.com/s/XkYe):
Explain Output (Limit (cost=100594.67..100594.68 rows=1 width=32) (actual time=1775.416..1775.417 rows=1 loops=1)
Buffers: shared hit=311055 read=34274 dirtied=1751
I/O Timings: read=1271.685
-> Aggregate (cost=100594.67..100594.68 rows=1 width=32) (actual time=1775.415..1775.415 rows=1 loops=1)
Buffers: shared hit=311055 read=34274 dirtied=1751
I/O Timings: read=1271.685
-> Nested Loop (cost=1.43..100443.27 rows=30279 width=16) (actual time=3.973..1757.792 rows=39840 loops=1)
Buffers: shared hit=311055 read=34274 dirtied=1751
I/O Timings: read=1271.685
-> Nested Loop (cost=1.00..85806.85 rows=29009 width=8) (actual time=2.382..712.791 rows=39840 loops=1)
Buffers: shared hit=175063 read=10852 dirtied=1041
I/O Timings: read=449.314
-> Index Scan using index_routes_on_path_text_pattern_ops on public.routes rs (cost=0.56..36136.56 rows=31215 width=4) (actual time=2.040..414.918 rows=39840 loops=1)
Index Cond: (((rs.path)::text ~>=~ 'gbobject/'::text) AND ((rs.path)::text ~<~ 'gbobject0'::text))
Filter: (((rs.path)::text ~~ 'gbobject/%'::text) AND ((rs.source_type)::text = 'Project'::text))
Rows Removed by Filter: 0
Buffers: shared hit=30270 read=9127 dirtied=207
I/O Timings: read=324.803
-> Index Only Scan using projects_pkey on public.projects (cost=0.43..1.59 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=39840)
Index Cond: (projects.id = rs.source_id)
Heap Fetches: 1583
Buffers: shared hit=144793 read=1725 dirtied=834
I/O Timings: read=124.511
-> Index Scan using index_project_statistics_on_project_id on public.project_statistics (cost=0.43..0.49 rows=1 width=20) (actual time=0.025..0.025 rows=1 loops=39840)
Index Cond: (project_statistics.project_id = projects.id)
Buffers: shared hit=135992 read=23422 dirtied=710
I/O Timings: read=822.371
Summary:
Cold cache:
Time: 1.777 s
- planning: 1.671 ms
- execution: 1.776 s
- I/O read: 1.272 s
- I/O write: 0.000 ms
Shared buffers:
- hits: 311055 (~2.40 GiB) from the buffer pool
- reads: 34274 (~267.80 MiB) from the OS file cache, including disk I/O
- dirtied: 1751 (~13.70 MiB)
- writes: 0
Shared buffers:
- hits: 0 from the buffer pool
- reads: 4 (~32.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Warm cache:
Time: 1.917 ms
- planning: 1.808 ms
- execution: 0.109 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 4 (~32.00 KiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Edited by Corinna Gogolok