Skip to content

Show only personal snippets on explore page

Markus Koller requested to merge 30877-optimize-explore-snippets into master

What does this MR do?

This improves the performance of the "Explore snippets" page by excluding project snippets from the query.

Ref: #30877 (closed), follow-up to gitlab-foss!32576 (merged)

Query plans

Plans for the query change itself, taken from staging:

Before
gitlabhq_production=> EXPLAIN ANALYZE
SELECT "snippets".*
FROM
  (SELECT "snippets".*
   FROM "snippets"
   WHERE (snippets.visibility_level IN (0,
                                        10,
                                        20)
          OR snippets.author_id = 1)
     AND "snippets"."project_id" IS NULL
   UNION SELECT "snippets".*
   FROM "snippets"
   INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id"
   INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
   WHERE (snippets.visibility_level IN (0,
                                        10,
                                        20)
          OR snippets.author_id = 1)
     AND (projects.visibility_level IN (0,
                                        10,
                                        20))
     AND "project_features"."snippets_access_level" IN (20,
                                                        30)
   UNION SELECT "snippets".*
   FROM "snippets"
   INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id"
   INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
   WHERE "project_features". "snippets_access_level" IN (20,
                                                         30,
                                                         10)
     AND (EXISTS
            (SELECT 1
             FROM "project_authorizations"
             WHERE (project_id = snippets.project_id)
               AND "project_authorizations"."user_id" = 1))) snippets
ORDER BY created_at DESC
LIMIT 10
OFFSET 0;
                                                                                                                                                               
        QUERY PLAN                                                                                                                                             
                          
---------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------
 Limit  (cost=103221.02..103221.04 rows=10 width=1744) (actual time=4570.708..4570.727 rows=10 loops=1)
   ->  Sort  (cost=103221.02..103357.13 rows=54443 width=1744) (actual time=4570.706..4570.709 rows=10 loops=1)
         Sort Key: snippets.created_at DESC
         Sort Method: top-N heapsort  Memory: 30kB
         ->  HashAggregate  (cost=100955.66..101500.09 rows=54443 width=1744) (actual time=4496.468..4542.177 rows=46267 loops=1)
               Group Key: snippets.id, snippets.title, snippets.content, snippets.author_id, snippets.project_id, snippets.created_at, snippets.updated_at, sni
ppets.file_name, snippets.type, snippets.visibility_level, snippets.title_html, snippets.content_html, snippets.cached_markdown_version, snippets.description, 
snippets.description_html
               ->  Append  (cost=0.00..98914.05 rows=54443 width=1744) (actual time=0.013..2410.551 rows=46521 loops=1)
                     ->  Seq Scan on snippets  (cost=0.00..7082.38 rows=37556 width=1126) (actual time=0.012..31.345 rows=37610 loops=1)
                           Filter: ((project_id IS NULL) AND ((visibility_level = ANY ('{0,10,20}'::integer[])) OR (author_id = 1)))
                           Rows Removed by Filter: 13600
                     ->  Nested Loop  (cost=0.86..90297.65 rows=16884 width=1126) (actual time=0.109..2294.916 rows=8653 loops=1)
                           Join Filter: (snippets_1.project_id = projects.id)
                           ->  Nested Loop  (cost=0.43..86908.90 rows=5015 width=1130) (actual time=0.086..1509.970 rows=8653 loops=1)
                                 ->  Seq Scan on snippets snippets_1  (cost=0.00..7082.38 rows=51162 width=1126) (actual time=0.029..33.603 rows=51210 loops=1)
                                       Filter: ((visibility_level = ANY ('{0,10,20}'::integer[])) OR (author_id = 1))
                                 ->  Index Scan using index_project_features_on_project_id on project_features  (cost=0.43..1.55 rows=1 width=4) (actual time=0
.028..0.029 rows=0 loops=51210)
                                       Index Cond: (project_id = snippets_1.project_id)
                                       Filter: (snippets_access_level = ANY ('{20,30}'::integer[]))
                                       Rows Removed by Filter: 0
                           ->  Index Scan using projects_pkey on projects  (cost=0.43..0.66 rows=1 width=4) (actual time=0.089..0.090 rows=1 loops=8653)
                                 Index Cond: (id = project_features.project_id)
                                 Filter: (visibility_level = ANY ('{0,10,20}'::integer[]))
                     ->  Nested Loop  (cost=14.38..989.59 rows=3 width=1126) (actual time=27.655..79.280 rows=258 loops=1)
                           ->  Nested Loop  (cost=13.95..986.22 rows=7 width=1134) (actual time=27.626..78.134 rows=258 loops=1)
                                 ->  Nested Loop  (cost=13.66..841.75 rows=415 width=8) (actual time=0.294..45.374 rows=353 loops=1)
                                       ->  HashAggregate  (cost=13.23..17.38 rows=415 width=4) (actual time=0.268..0.511 rows=353 loops=1)
                                             Group Key: project_authorizations.project_id
                                             ->  Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorization
s  (cost=0.43..12.19 rows=415 width=4) (actual time=0.093..0.192 rows=353 loops=1)
                                                   Index Cond: (user_id = 1)
                                                   Heap Fetches: 2
                                       ->  Index Only Scan using projects_pkey on projects projects_1  (cost=0.43..1.98 rows=1 width=4) (actual time=0.125..0.1
26 rows=1 loops=353)
                                             Index Cond: (id = project_authorizations.project_id)
                                             Heap Fetches: 246
                                 ->  Index Scan using index_snippets_on_project_id on snippets snippets_2  (cost=0.29..0.33 rows=2 width=1126) (actual time=0.0
89..0.092 rows=1 loops=353)
                                       Index Cond: (project_id = projects_1.id)
                           ->  Index Scan using index_project_features_on_project_id on project_features project_features_1  (cost=0.43..0.47 rows=1 width=4) (
actual time=0.003..0.004 rows=1 loops=258)
                                 Index Cond: (project_id = projects_1.id)
                                 Filter: (snippets_access_level = ANY ('{20,30,10}'::integer[]))
 Planning time: 13.568 ms
 Execution time: 4571.079 ms
After
gitlabhq_production=> EXPLAIN ANALYZE
gitlabhq_production-> SELECT "snippets".*
gitlabhq_production-> FROM "snippets"
gitlabhq_production-> WHERE snippets.visibility_level IN (0,
gitlabhq_production(>                                     10,
gitlabhq_production(>                                     20)
gitlabhq_production->   AND "snippets"."project_id" IS NULL
gitlabhq_production-> ORDER BY created_at DESC
gitlabhq_production-> LIMIT 10
gitlabhq_production-> OFFSET 0;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=7768.30..7768.33 rows=10 width=1123) (actual time=258.436..258.442 rows=10 loops=1)
   ->  Sort  (cost=7768.30..7862.36 rows=37622 width=1123) (actual time=258.434..258.435 rows=10 loops=1)
         Sort Key: created_at DESC
         Sort Method: top-N heapsort  Memory: 30kB
         ->  Seq Scan on snippets  (cost=0.00..6955.30 rows=37622 width=1123) (actual time=1.254..251.282 rows=37719 loops=1)
               Filter: ((project_id IS NULL) AND (visibility_level = ANY ('{0,10,20}'::integer[])))
               Rows Removed by Filter: 13600
 Planning time: 3.704 ms
 Execution time: 258.512 ms
(9 rows)

Plans for the index changes, taken from #database-lab:

Before
explain SELECT "snippets".* FROM "snippets" WHERE snippets.visibility_level IN (0, 10, 20) AND "snippets"."project_id" IS NULL ORDER BY created_at DESC LIMIT 10 OFFSET 0;

 Limit  (cost=72914.70..72914.72 rows=10 width=1083) (actual time=1256.131..1256.141 rows=10 loops=1)
   Buffers: shared dirtied=1409 hit=64120 read=29583
   ->  Sort  (cost=72914.70..73276.58 rows=144754 width=1083) (actual time=1256.128..1256.135 rows=10 loops=1)
         Sort Key: snippets.created_at DESC
         Sort Method: top-N heapsort  Memory: 39kB
         Buffers: shared dirtied=1409 hit=64120 read=29583
         ->  Index Scan using index_snippets_on_project_id on public.snippets  (cost=0.42..69786.62 rows=144754 width=1083) (actual time=0.458..1201.158 rows=144755 loops=1)
               Index Cond: (snippets.project_id IS NULL)
               Filter: (snippets.visibility_level = ANY ('{0,10,20}'::integer[]))
               Rows Removed by Filter: 0
               Buffers: shared dirtied=1409 hit=64117 read=29583

Cost: 73276.58

Time: 1.259 s
  - planning: 2.848 ms
  - execution: 1.256 s
    - I/O read: 968.999 ms

Shared buffers:
  - hits: 64120 (~500.90 MiB) from the buffer pool
  - reads: 29583 (~231.10 MiB) from the OS file cache, including disk I/O
  - dirtied: 1409 (~11.00 MiB)
  - writes: 0
After (filtering on both project_id and visibility_level)
explain SELECT "snippets".* FROM "snippets" WHERE snippets.visibility_level IN (0, 10, 20) AND "snippets"."project_id" IS NULL ORDER BY created_at DESC LIMIT 10 OFFSET 0;

 Limit  (cost=68674.70..68674.72 rows=10 width=1083) (actual time=179.753..179.762 rows=10 loops=1)
   Buffers: shared hit=56452 read=558
   ->  Sort  (cost=68674.70..69036.58 rows=144755 width=1083) (actual time=179.750..179.756 rows=10 loops=1)
         Sort Key: snippets.created_at DESC
         Sort Method: top-N heapsort  Memory: 39kB
         Buffers: shared hit=56452 read=558
         ->  Index Scan using index_snippets_on_project_id_and_visibility_level on public.snippets  (cost=0.42..65546.59 rows=144755 width=1083) (actual time=0.167..123.456 rows=144755 loops=1)
               Index Cond: ((snippets.project_id IS NULL) AND (snippets.visibility_level = ANY ('{0,10,20}'::integer[])))
               Buffers: shared hit=56449 read=558

Cost: 69036.58

Time: 186.205 ms
  - planning: 6.386 ms
  - execution: 179.819 ms
    - I/O read: 9.709 ms

Shared buffers:
  - hits: 56452 (~441.00 MiB) from the buffer pool
  - reads: 558 (~4.40 MiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
After (filtering only on project_id)
explain SELECT "snippets".* FROM "snippets" WHERE "snippets"."project_id" IS NULL ORDER BY created_at DESC LIMIT 10 OFFSET 0;

 Limit  (cost=68573.96..68573.98 rows=10 width=1083) (actual time=160.831..160.840 rows=10 loops=1)
   Buffers: shared hit=57004
   ->  Sort  (cost=68573.96..68935.85 rows=144755 width=1083) (actual time=160.829..160.834 rows=10 loops=1)
         Sort Key: snippets.created_at DESC
         Sort Method: top-N heapsort  Memory: 39kB
         Buffers: shared hit=57004
         ->  Index Scan using index_snippets_on_project_id_and_visibility_level on public.snippets  (cost=0.42..65445.86 rows=144755 width=1083) (actual time=0.039..103.798 rows=144755 loops=1)
               Index Cond: (snippets.project_id IS NULL)
               Buffers: shared hit=57001

Cost: 68935.85

Time: 163.925 ms
  - planning: 3.042 ms
  - execution: 160.883 ms

Shared buffers:
  - hits: 57004 (~445.30 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Does this MR meet the acceptance criteria?

Conformity

Performance and Testing

Edited by 🤖 GitLab Bot 🤖

Merge request reports

Loading