Improve performance of public snippets API endpoint
What does this MR do?
Solves issue #35389
Improve performance of /api/:version/snippets/public API and only return public personal snippets.
The majority of the performance gains were found by ignoring project snippets in the query. With project snippets we need to run a three query UNION with some inner joins. Without project snippets we can simply scan the snippets table.
There is also an index added to the snippets created_at
column to improve the performance of sort ordering.
Query plans
The core query change without the created_at index:
Before
SELECT "snippets".* FROM ((SELECT "snippets".* FROM "snippets" WHERE (snippets.visibility_level IN (10,20) OR snippets.author_id = 4092171) AND "snippets"."project_id" IS NULL)
UNION
(SELECT "snippets".* FROM "snippets" INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id" INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id" WHERE (snippets.visibility_level IN (10,20) OR snippets.author_id = 4092171) AND (projects.visibility_level IN (10,20)) AND "project_features"."snippets_access_level" IN (20, 30))
UNION
(SELECT "snippets".* FROM "snippets" INNER JOIN "projects" ON "projects"."id" = "snippets"."project_id" INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id" WHERE "project_features"."snippets_access_level" IN (20, 30, 10) AND (EXISTS (SELECT 1 FROM "project_authorizations" WHERE (project_id = snippets.project_id) AND "project_authorizations"."user_id" = 4092171)))) snippets WHERE "snippets"."visibility_level" = 20 ORDER BY created_at DESC
--- QUERY PLAN ---
Sort (cost=93639.06..93671.54 rows=12989 width=1744) (actual time=13982.518..14011.068 rows=49992 loops=1)
Sort Key: snippets.created_at DESC
Sort Method: quicksort Memory: 76120kB
Buffers: shared hit=306865 read=80163
I/O Timings: read=8209.162
-> HashAggregate (cost=92491.81..92621.70 rows=12989 width=1744) (actual time=13851.068..13900.894 rows=49992 loops=1)
Group Key: snippets.id, snippets.title, snippets.content, snippets.author_id, snippets.project_id, snippets.created_at, snippets.updated_at, snippets.file_name, snippets.type, snippets.visibility_level, snippets.title_html, snippets.content_html, snippets.cached_markdown_version, snippets.description, snippets.description_html
Buffers: shared hit=306862 read=80163
I/O Timings: read=8209.162
-> Append (cost=215.80..92004.72 rows=12989 width=1744) (actual time=13.185..503.752 rows=50316 loops=1)
Buffers: shared hit=137176 read=458
I/O Timings: read=25.433
-> Bitmap Heap Scan on snippets (cost=215.80..27777.04 rows=11193 width=1080) (actual time=13.184..178.426 rows=45568 loops=1)
Recheck Cond: (((project_id IS NULL) AND (visibility_level = ANY ('{10,20}'::integer[])) AND (visibility_level = 20)) OR (author_id = 4092171))
Filter: ((project_id IS NULL) AND (visibility_level = 20))
Heap Blocks: exact=21778
Buffers: shared hit=22002 read=1
I/O Timings: read=0.035
-> BitmapOr (cost=215.80..215.80 rows=11196 width=0) (actual time=9.055..9.055 rows=0 loops=1)
Buffers: shared hit=224 read=1
I/O Timings: read=0.035
-> Bitmap Index Scan on index_snippets_on_project_id_and_visibility_level (cost=0.00..208.27 rows=11193 width=0) (actual time=8.911..8.911 rows=45669 loops=1)
Index Cond: ((project_id IS NULL) AND (visibility_level = ANY ('{10,20}'::integer[])) AND (visibility_level = 20))
Buffers: shared hit=222
-> Bitmap Index Scan on index_snippets_on_author_id (cost=0.00..1.94 rows=3 width=0) (actual time=0.141..0.142 rows=0 loops=1)
Index Cond: (author_id = 4092171)
Buffers: shared hit=2 read=1
I/O Timings: read=0.035
-> Nested Loop (cost=1.28..63710.34 rows=1795 width=1080) (actual time=0.178..256.755 rows=4386 loops=1)
Buffers: shared hit=99907 read=277
I/O Timings: read=5.823
-> Nested Loop (cost=0.85..62317.06 rows=2236 width=1084) (actual time=0.126..183.700 rows=4559 loops=1)
Buffers: shared hit=81643 read=277
I/O Timings: read=5.823
-> Index Scan using index_snippets_on_visibility_level on snippets snippets_1 (cost=0.42..37304.30 rows=16283 width=1080) (actual time=0.037..91.027 rows=53072 loops=1)
Index Cond: (visibility_level = 20)
Buffers: shared hit=52242
-> Index Only Scan using index_projects_on_id_partial_for_visibility on projects (cost=0.43..1.53 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=53072)
Index Cond: (id = snippets_1.project_id)
Heap Fetches: 2902
Buffers: shared hit=29401 read=277
I/O Timings: read=5.823
-> Index Scan using index_project_features_on_project_id on project_features (cost=0.43..0.61 rows=1 width=4) (actual time=0.015..0.016 rows=1 loops=4559)
Index Cond: (project_id = projects.id)
Filter: (snippets_access_level = ANY ('{20,30}'::integer[]))
Rows Removed by Filter: 0
Buffers: shared hit=18264
-> Nested Loop (cost=44.07..387.45 rows=1 width=1080) (actual time=8.335..61.760 rows=362 loops=1)
Buffers: shared hit=15267 read=180
I/O Timings: read=19.575
-> Nested Loop (cost=43.63..386.97 rows=1 width=1088) (actual time=8.313..59.885 rows=362 loops=1)
Buffers: shared hit=13819 read=180
I/O Timings: read=19.575
-> Nested Loop (cost=43.21..329.28 rows=127 width=8) (actual time=3.738..34.837 rows=1846 loops=1)
Buffers: shared hit=8079 read=12
I/O Timings: read=2.295
-> HashAggregate (cost=42.78..44.05 rows=127 width=4) (actual time=3.695..4.565 rows=1846 loops=1)
Group Key: project_authorizations.project_id
Buffers: shared hit=68 read=12
I/O Timings: read=2.295
-> Index Only Scan using index_project_authorizations_on_user_id_project_id_access_level on project_authorizations (cost=0.56..42.46 rows=127 width=4) (actual time=0.085..3.270 rows=1846 loops=1)
Index Cond: (user_id = 4092171)
Heap Fetches: 145
Buffers: shared hit=68 read=12
I/O Timings: read=2.295
-> Index Only Scan using projects_pkey on projects projects_1 (cost=0.43..2.24 rows=1 width=4) (actual time=0.015..0.016 rows=1 loops=1846)
Index Cond: (id = project_authorizations.project_id)
Heap Fetches: 702
Buffers: shared hit=8011
-> Index Scan using index_snippets_on_project_id_and_visibility_level on snippets snippets_2 (cost=0.42..0.44 rows=1 width=1080) (actual time=0.013..0.013 rows=0 loops=1846)
Index Cond: ((project_id = projects_1.id) AND (visibility_level = 20))
Buffers: shared hit=5740 read=168
I/O Timings: read=17.280
-> Index Scan using index_project_features_on_project_id on project_features project_features_1 (cost=0.43..0.48 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=362)
Index Cond: (project_id = projects_1.id)
Filter: (snippets_access_level = ANY ('{20,30,10}'::integer[]))
Buffers: shared hit=1448
Planning time: 20.592 ms
Execution time: 14021.044 ms
After
SELECT "snippets".* FROM "snippets" WHERE "snippets"."project_id" IS NULL AND "snippets"."visibility_level" = 20 order by created_at desc
--- QUERY PLAN ---
Sort (cost=35256.72..35347.86 rows=36458 width=1080) (actual time=174.320..197.876 rows=45568 loops=1)
Sort Key: created_at DESC
Sort Method: quicksort Memory: 69900kB
Buffers: shared hit=27722
-> Index Scan using index_snippets_on_project_id_and_visibility_level on snippets (cost=0.42..32494.30 rows=36458 width=1080) (actual time=0.091..110.618 rows=45568 loops=1)
Index Cond: ((project_id IS NULL) AND (visibility_level = 20))
Buffers: shared hit=27719
Planning time: 5.619 ms
Execution time: 205.457 ms
I will run the SQL explain
on staging again once the index has been deployed.
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides - [-] Separation of EE specific content
Availability and Testing
Edited by 🤖 GitLab Bot 🤖