Prepare excess storage data in project and namespace (!43462) · Merge requests · GitLab.org / GitLab

Tyler Amos requested to merge 255348-excess-storage-info-in-storage-graphql into master Sep 25, 2020

What does this MR do?

Part of #255348

Introduce and adjust methods in the project and namespace models related to excess repository storage data. These methods will be used in another merge request to pass data to the storage endpoint response.

Include lfs_object_size in calculations for total repository size of a project.
- This means a slight adjustment to the method/query introduced as part of !43322 (merged), total_repository_size_excess. Previously, we only considered project_statistics.repository_size but now we will use the sum of project_statistics.repository_size and project_statistics.lfs_object_size.
Introduce new methods/queries below, some based on logic from above query:
- repository_size_excess_project_count: Number of projects where total_repository_size exceeds the limit.
- total_repository_size: Sum total of all projects' total_repository_size in the namespace.
- contains_locked_projects: Boolean for whether the total repository excess exceeds the additional purchased storage size.

Database Review Notes

`total_repository_size_excess`

You can see the details of the query plan before these changes in !43322 (merged).

First query for projects with a set repository limit:

Formatted Query:

EXPLAIN SELECT
    SUM((("project_statistics"."repository_size" + "project_statistics"."lfs_objects_size") - "projects"."repository_size_limit")) 
FROM
    "projects"
    INNER JOIN routes rs ON rs.source_id = projects.id
        AND rs.source_type = 'Project'
    INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id"
WHERE (rs.path LIKE 'gbobject/%')
    AND "projects"."repository_size_limit" != 0 AND ((project_statistics.repository_size + project_statistics.lfs_objects_size) > projects.repository_size_limit)

Explain Output (https://explain.depesz.com/s/sP8s):

Aggregate  (cost=130846.63..130846.64 rows=1 width=32) (actual time=2797.824..2797.824 rows=1 loops=1)
   Buffers: shared hit=154269 read=44687 dirtied=1734
   I/O Timings: read=2402.392
   ->  Nested Loop  (cost=1.43..130846.62 rows=1 width=24) (actual time=2797.817..2797.817 rows=0 loops=1)
         Buffers: shared hit=154269 read=44687 dirtied=1734
         I/O Timings: read=2402.392
         ->  Nested Loop  (cost=1.00..130837.10 rows=3 width=16) (actual time=2797.816..2797.817 rows=0 loops=1)
               Buffers: shared hit=154269 read=44687 dirtied=1734
               I/O Timings: read=2402.392
               ->  Index Scan using index_routes_on_path_text_pattern_ops on public.routes rs  (cost=0.56..36023.61 rows=31167 width=4) (actual time=6.516..638.718 rows=39840 loops=1)
                     Index Cond: (((rs.path)::text ~>=~ 'gbobject/'::text) AND ((rs.path)::text ~<~ 'gbobject0'::text))
                     Filter: (((rs.path)::text ~~ 'gbobject/%'::text) AND ((rs.source_type)::text = 'Project'::text))
                     Rows Removed by Filter: 0
                     Buffers: shared hit=30270 read=9127 dirtied=270
                     I/O Timings: read=536.476
               ->  Index Scan using projects_pkey on public.projects  (cost=0.43..3.04 rows=1 width=12) (actual time=0.053..0.053 rows=0 loops=39840)
                     Index Cond: (projects.id = rs.source_id)
                     Filter: (projects.repository_size_limit <> 0)
                     Rows Removed by Filter: 1
                     Buffers: shared hit=123999 read=35560 dirtied=1464
                     I/O Timings: read=1865.916
         ->  Index Scan using index_project_statistics_on_project_id on public.project_statistics  (cost=0.43..3.16 rows=1 width=20) (actual time=0.000..0.000 rows=0 loops=0)
               Index Cond: (project_statistics.project_id = rs.source_id)

Summary:

Cold cache:

Time: 2.799 s
  - planning: 1.460 ms
  - execution: 2.798 s
    - I/O read: 2.402 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 154269 (~1.20 GiB) from the buffer pool
  - reads: 44687 (~349.10 MiB) from the OS file cache, including disk I/O
  - dirtied: 1734 (~13.50 MiB)
  - writes: 0

Warm cache:

Time: 230.490 ms
  - planning: 2.094 ms
  - execution: 228.396 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 198956 (~1.50 GiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Second query for projects that fall back to the namespace repository limit (example with 10 GB):

Formatted Query:

EXPLAIN SELECT
    SUM((("project_statistics"."repository_size" + "project_statistics"."lfs_objects_size") - 10485760000))
FROM
    "projects"
    INNER JOIN routes rs ON rs.source_id = projects.id
        AND rs.source_type = 'Project' 
    INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id"
WHERE (rs.path LIKE 'gbobject/%')
    AND "projects"."repository_size_limit" IS NULL AND ((project_statistics.repository_size + project_statistics.lfs_objects_size) > '10485760000')

Explain Output (https://explain.depesz.com/s/1hW2):

Aggregate  (cost=141081.05..141081.06 rows=1 width=32) (actual time=1237.196..1237.196 rows=1 loops=1)
   Buffers: shared hit=166505 read=32550 dirtied=1017
   I/O Timings: read=936.221
   ->  Nested Loop  (cost=1.43..141006.09 rows=9994 width=16) (actual time=1237.190..1237.190 rows=0 loops=1)
         Buffers: shared hit=166505 read=32550 dirtied=1017
         I/O Timings: read=936.221
         ->  Nested Loop  (cost=1.00..134575.02 rows=9995 width=24) (actual time=1237.190..1237.190 rows=0 loops=1)
               Buffers: shared hit=166505 read=32550 dirtied=1017
               I/O Timings: read=936.221
               ->  Index Scan using index_routes_on_path_text_pattern_ops on public.routes rs  (cost=0.56..36023.61 rows=31167 width=4) (actual time=0.743..330.281 rows=39840 loops=1)
                     Index Cond: (((rs.path)::text ~>=~ 'gbobject/'::text) AND ((rs.path)::text ~<~ 'gbobject0'::text))
                     Filter: (((rs.path)::text ~~ 'gbobject/%'::text) AND ((rs.source_type)::text = 'Project'::text))
                     Rows Removed by Filter: 0
                     Buffers: shared hit=30270 read=9127 dirtied=270
                     I/O Timings: read=252.951
               ->  Index Scan using index_project_statistics_on_project_id on public.project_statistics  (cost=0.43..3.16 rows=1 width=20) (actual time=0.022..0.022 rows=0 loops=39840)
                     Index Cond: (project_statistics.project_id = rs.source_id)
                     Filter: ((project_statistics.repository_size + project_statistics.lfs_objects_size) > '10485760000'::bigint)
                     Rows Removed by Filter: 1
                     Buffers: shared hit=136235 read=23423 dirtied=747
                     I/O Timings: read=683.270
         ->  Index Scan using projects_pkey on public.projects  (cost=0.43..0.63 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
               Index Cond: (projects.id = project_statistics.project_id)
               Filter: (projects.repository_size_limit IS NULL)
               Rows Removed by Filter: 0

Summary:

Cold cache:

Time: 1.239 s
  - planning: 1.371 ms
  - execution: 1.237 s
    - I/O read: 936.221 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 166505 (~1.30 GiB) from the buffer pool
  - reads: 32550 (~254.30 MiB) from the OS file cache, including disk I/O
  - dirtied: 1017 (~7.90 MiB)
  - writes: 0

Warm cache:

Time: 176.866 ms
  - planning: 1.347 ms
  - execution: 175.519 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 199055 (~1.50 GiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

`repository_size_excess_project_count`

First query for projects with a set repository limit:

Formatted Query:

EXPLAIN SELECT
    COUNT(*)
FROM
    "projects"
    INNER JOIN routes rs ON rs.source_id = projects.id
        AND rs.source_type = 'Project'
    INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id"
WHERE (rs.path LIKE 'gbobject/%')
    AND "projects"."repository_size_limit" != 0 AND ((project_statistics.repository_size + project_statistics.lfs_objects_size) > projects.repository_size_limit)

Explain Output (https://explain.depesz.com/s/kk5T):

Aggregate  (cost=130846.62..130846.63 rows=1 width=8) (actual time=35115.591..35115.592 rows=1 loops=1)
   Buffers: shared hit=154269 read=44687 dirtied=1734
   I/O Timings: read=34285.585
   ->  Nested Loop  (cost=1.43..130846.62 rows=1 width=0) (actual time=35115.582..35115.582 rows=0 loops=1)
         Buffers: shared hit=154269 read=44687 dirtied=1734
         I/O Timings: read=34285.585
         ->  Nested Loop  (cost=1.00..130837.10 rows=3 width=16) (actual time=35115.580..35115.581 rows=0 loops=1)
               Buffers: shared hit=154269 read=44687 dirtied=1734
               I/O Timings: read=34285.585
               ->  Index Scan using index_routes_on_path_text_pattern_ops on public.routes rs  (cost=0.56..36023.61 rows=31167 width=4) (actual time=0.696..1036.064 rows=39840 loops=1)
                     Index Cond: (((rs.path)::text ~>=~ 'gbobject/'::text) AND ((rs.path)::text ~<~ 'gbobject0'::text))
                     Filter: (((rs.path)::text ~~ 'gbobject/%'::text) AND ((rs.source_type)::text = 'Project'::text))
                     Rows Removed by Filter: 0
                     Buffers: shared hit=30270 read=9127 dirtied=270
                     I/O Timings: read=831.757
               ->  Index Scan using projects_pkey on public.projects  (cost=0.43..3.04 rows=1 width=12) (actual time=0.853..0.853 rows=0 loops=39840)
                     Index Cond: (projects.id = rs.source_id)
                     Filter: (projects.repository_size_limit <> 0)
                     Rows Removed by Filter: 1
                     Buffers: shared hit=123999 read=35560 dirtied=1464
                     I/O Timings: read=33453.829
         ->  Index Scan using index_project_statistics_on_project_id on public.project_statistics  (cost=0.43..3.16 rows=1 width=20) (actual time=0.000..0.000 rows=0 loops=0)
               Index Cond: (project_statistics.project_id = rs.source_id)

Summary:

Cold cache:

Time: 1.561 s
  - planning: 1.337 ms
  - execution: 1.559 s
    - I/O read: 1.223 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 154269 (~1.20 GiB) from the buffer pool
  - reads: 44687 (~349.10 MiB) from the OS file cache, including disk I/O
  - dirtied: 1734 (~13.50 MiB)
  - writes: 0

Warm cache:

Time: 213.707 ms
  - planning: 1.246 ms
  - execution: 212.461 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 198956 (~1.50 GiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Second query for projects that fall back to the namespace repository limit (example with 10 GB):

Formatted Query:

EXPLAIN SELECT
    COUNT(*)
FROM
    "projects"
    INNER JOIN routes rs ON rs.source_id = projects.id
        AND rs.source_type = 'Project'
    INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id"
WHERE (rs.path LIKE 'gbobject/%')
    AND "projects"."repository_size_limit" IS NULL AND ((project_statistics.repository_size + project_statistics.lfs_objects_size) > 10485760000)

Explain Output (https://explain.depesz.com/s/WUES):

Aggregate  (cost=141031.08..141031.09 rows=1 width=8) (actual time=1038.685..1038.685 rows=1 loops=1)
   Buffers: shared hit=166505 read=32550 dirtied=1017
   I/O Timings: read=805.359
   ->  Nested Loop  (cost=1.43..141006.09 rows=9994 width=0) (actual time=1038.680..1038.680 rows=0 loops=1)
         Buffers: shared hit=166505 read=32550 dirtied=1017
         I/O Timings: read=805.359
         ->  Nested Loop  (cost=1.00..134575.02 rows=9995 width=8) (actual time=1038.679..1038.680 rows=0 loops=1)
               Buffers: shared hit=166505 read=32550 dirtied=1017
               I/O Timings: read=805.359
               ->  Index Scan using index_routes_on_path_text_pattern_ops on public.routes rs  (cost=0.56..36023.61 rows=31167 width=4) (actual time=0.613..287.664 rows=39840 loops=1)
                     Index Cond: (((rs.path)::text ~>=~ 'gbobject/'::text) AND ((rs.path)::text ~<~ 'gbobject0'::text))
                     Filter: (((rs.path)::text ~~ 'gbobject/%'::text) AND ((rs.source_type)::text = 'Project'::text))
                     Rows Removed by Filter: 0
                     Buffers: shared hit=30270 read=9127 dirtied=270
                     I/O Timings: read=226.977
               ->  Index Scan using index_project_statistics_on_project_id on public.project_statistics  (cost=0.43..3.16 rows=1 width=4) (actual time=0.018..0.018 rows=0 loops=39840)
                     Index Cond: (project_statistics.project_id = rs.source_id)
                     Filter: ((project_statistics.repository_size + project_statistics.lfs_objects_size) > '10485760000'::bigint)
                     Rows Removed by Filter: 1
                     Buffers: shared hit=136235 read=23423 dirtied=747
                     I/O Timings: read=578.381
         ->  Index Scan using projects_pkey on public.projects  (cost=0.43..0.63 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
               Index Cond: (projects.id = project_statistics.project_id)
               Filter: (projects.repository_size_limit IS NULL)
               Rows Removed by Filter: 0

Summary:

Cold cache:

Time: 1.040 s
  - planning: 1.063 ms
  - execution: 1.039 s
    - I/O read: 805.359 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 166505 (~1.30 GiB) from the buffer pool
  - reads: 32550 (~254.30 MiB) from the OS file cache, including disk I/O
  - dirtied: 1017 (~7.90 MiB)
  - writes: 0

Warm cache:

Time: 312.032 ms
  - planning: 1.278 ms
  - execution: 310.754 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 345329 (~2.60 GiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

`total_repository_size`

Formatted Query:

EXPLAIN SELECT
    SUM(("project_statistics"."repository_size" + "project_statistics"."lfs_objects_size"))
FROM "projects"
    INNER JOIN routes rs ON rs.source_id = projects.id
        AND rs.source_type = 'Project'
    INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id"
WHERE (rs.path LIKE 'gbobject/%') LIMIT 1

Explain Output (https://explain.depesz.com/s/XkYe):

Limit  (cost=100594.67..100594.68 rows=1 width=32) (actual time=1775.416..1775.417 rows=1 loops=1)
   Buffers: shared hit=311055 read=34274 dirtied=1751
   I/O Timings: read=1271.685
   ->  Aggregate  (cost=100594.67..100594.68 rows=1 width=32) (actual time=1775.415..1775.415 rows=1 loops=1)
         Buffers: shared hit=311055 read=34274 dirtied=1751
         I/O Timings: read=1271.685
         ->  Nested Loop  (cost=1.43..100443.27 rows=30279 width=16) (actual time=3.973..1757.792 rows=39840 loops=1)
               Buffers: shared hit=311055 read=34274 dirtied=1751
               I/O Timings: read=1271.685
               ->  Nested Loop  (cost=1.00..85806.85 rows=29009 width=8) (actual time=2.382..712.791 rows=39840 loops=1)
                     Buffers: shared hit=175063 read=10852 dirtied=1041
                     I/O Timings: read=449.314
                     ->  Index Scan using index_routes_on_path_text_pattern_ops on public.routes rs  (cost=0.56..36136.56 rows=31215 width=4) (actual time=2.040..414.918 rows=39840 loops=1)
                           Index Cond: (((rs.path)::text ~>=~ 'gbobject/'::text) AND ((rs.path)::text ~<~ 'gbobject0'::text))
                           Filter: (((rs.path)::text ~~ 'gbobject/%'::text) AND ((rs.source_type)::text = 'Project'::text))
                           Rows Removed by Filter: 0
                           Buffers: shared hit=30270 read=9127 dirtied=207
                           I/O Timings: read=324.803
                     ->  Index Only Scan using projects_pkey on public.projects  (cost=0.43..1.59 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=39840)
                           Index Cond: (projects.id = rs.source_id)
                           Heap Fetches: 1583
                           Buffers: shared hit=144793 read=1725 dirtied=834
                           I/O Timings: read=124.511
               ->  Index Scan using index_project_statistics_on_project_id on public.project_statistics  (cost=0.43..0.49 rows=1 width=20) (actual time=0.025..0.025 rows=1 loops=39840)
                     Index Cond: (project_statistics.project_id = projects.id)
                     Buffers: shared hit=135992 read=23422 dirtied=710
                     I/O Timings: read=822.371

Summary:

Cold cache:

Time: 1.777 s
  - planning: 1.671 ms
  - execution: 1.776 s
    - I/O read: 1.272 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 311055 (~2.40 GiB) from the buffer pool
  - reads: 34274 (~267.80 MiB) from the OS file cache, including disk I/O
  - dirtied: 1751 (~13.70 MiB)
  - writes: 0

Shared buffers:
  - hits: 0 from the buffer pool
  - reads: 4 (~32.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Warm cache:

Time: 1.917 ms
  - planning: 1.808 ms
  - execution: 0.109 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 4 (~32.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
Tested in all supported browsers
Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

Label as security and @ mention @gitlab-com/gl-security/appsec
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
Security reports checked/validated by a reviewer from the AppSec team

Edited Oct 08, 2020 by Corinna Gogolok

Prepare excess storage data in project and namespace

What does this MR do?

Database Review Notes

total_repository_size_excess

First query for projects with a set repository limit:

Formatted Query:

Explain Output (https://explain.depesz.com/s/sP8s):

Summary:

Cold cache:

Warm cache:

Second query for projects that fall back to the namespace repository limit (example with 10 GB):

Formatted Query:

Explain Output (https://explain.depesz.com/s/1hW2):

Summary:

Cold cache:

Warm cache:

repository_size_excess_project_count

First query for projects with a set repository limit:

Formatted Query:

Explain Output (https://explain.depesz.com/s/kk5T):

Summary:

Cold cache:

Warm cache:

Second query for projects that fall back to the namespace repository limit (example with 10 GB):

Formatted Query:

Explain Output (https://explain.depesz.com/s/WUES):

Summary:

Cold cache:

Warm cache:

total_repository_size

Formatted Query:

Explain Output (https://explain.depesz.com/s/XkYe):

Summary:

Cold cache:

Warm cache:

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Merge request reports

`total_repository_size_excess`

`repository_size_excess_project_count`

`total_repository_size`