Add project_id, id indexes to merge_requests & issues tables
What does this MR do?
This MR adds:
-
(target_project_id, id)
index on merge_requests -
(project_id, id)
index on issues
Why?
While working on !67150 (merged) a great observation was made (!67150 (comment 643170505)) that the following queries are not performant:
project.merge_requests.where.not(id: [1, 2, 3]).find_each { ... }
SELECT "merge_requests".* FROM "merge_requests" WHERE "merge_requests"."target_project_id" = 1 AND "merge_requests"."id" NOT IN (1, 2, 3) ORDER BY "merge_requests"."id" ASC LIMIT 1000
project.issues.where.not(id: [1, 2, 3]).find_each { ... }
SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 1 AND "issues"."id" NOT IN (1, 2, 3) ORDER BY "issues"."id" ASC LIMIT 1000
This is mainly due to added ORDER BY "issues"."id" ASC LIMIT 1000
by find_each
and not the where.not
clause. In order to resolve this performance issue and allow iterating over batches of MRs/issues add 2 new indexes.
Merge requests index creation ran for 31 minutes in database-lab (https://gitlab.slack.com/archives/CLJMDRD8C/p1628161279130000)
Issues index creation ran for 25 minutes in database-lab (https://gitlab.slack.com/archives/CLJMDRD8C/p1628159490116400)
Mentions #332630 (closed)
Migration output & execution plans
db/migrate/20210805103231_add_index_merge_requests_on_target_project_id_and_id.rb
Up
== 20210805103231 AddIndexMergeRequestsOnTargetProjectIdAndId: migrating ======
-- transaction_open?()
-> 0.0000s
-- index_exists?(:merge_requests, [:target_project_id, :id], {:name=>"index_merge_requests_on_target_project_id_and_id", :algorithm=>:concurrently})
-> 0.0172s
-- execute("SET statement_timeout TO 0")
-> 0.0008s
-- add_index(:merge_requests, [:target_project_id, :id], {:name=>"index_merge_requests_on_target_project_id_and_id", :algorithm=>:concurrently})
-> 0.0437s
-- execute("RESET ALL")
-> 0.0006s
== 20210805103231 AddIndexMergeRequestsOnTargetProjectIdAndId: migrated (0.0647s)
Down
== 20210805103231 AddIndexMergeRequestsOnTargetProjectIdAndId: reverting ======
-- transaction_open?()
-> 0.0000s
-- indexes(:merge_requests)
-> 0.0187s
-- execute("SET statement_timeout TO 0")
-> 0.0011s
-- remove_index(:merge_requests, {:algorithm=>:concurrently, :name=>"index_merge_requests_on_target_project_id_and_id"})
-> 0.0067s
-- execute("RESET ALL")
-> 0.0012s
== 20210805103231 AddIndexMergeRequestsOnTargetProjectIdAndId: reverted (0.0379s)
Execution Plan
- First run - 2.8 seconds - https://gitlab.slack.com/archives/CLJMDRD8C/p1628166958141800
Limit (cost=0.57..1234.47 rows=1000 width=764) (actual time=0.541..2883.509 rows=1000 loops=1)
Buffers: shared hit=12 read=995 dirtied=5
I/O Timings: read=2863.823 write=0.000
-> Index Scan using index_merge_requests_on_target_project_id_and_id on public.merge_requests (cost=0.57..90189.21 rows=73092 width=764) (actual time=0.538..2882.724 rows=1000 loops=1)
Index Cond: (merge_requests.target_project_id = 278964)
Filter: (merge_requests.id <> ALL ('{1,2,3}'::integer[]))
Rows Removed by Filter: 0
Buffers: shared hit=12 read=995 dirtied=5
I/O Timings: read=2863.823 write=0.000
- Second run - 3 ms - https://gitlab.slack.com/archives/CLJMDRD8C/p1628167045144500
Limit (cost=0.57..1230.72 rows=1000 width=764) (actual time=0.041..2.953 rows=1000 loops=1)
Buffers: shared hit=1007
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_merge_requests_on_target_project_id_and_id on public.merge_requests (cost=0.57..89915.11 rows=73092 width=764) (actual time=0.039..2.819 rows=1000 loops=1)
Index Cond: (merge_requests.target_project_id = 278964)
Buffers: shared hit=1007
I/O Timings: read=0.000 write=0.000
db/migrate/20210805102538_add_index_issues_on_project_id_and_id.rb
Up
== 20210805102538 AddIndexIssuesOnProjectIdAndId: migrating ===================
-- transaction_open?()
-> 0.0000s
-- index_exists?(:issues, [:project_id, :id], {:name=>"index_issues_on_project_id_and_id", :algorithm=>:concurrently})
-> 0.0201s
-- execute("SET statement_timeout TO 0")
-> 0.0007s
-- add_index(:issues, [:project_id, :id], {:name=>"index_issues_on_project_id_and_id", :algorithm=>:concurrently})
-> 0.0719s
-- execute("RESET ALL")
-> 0.0017s
== 20210805102538 AddIndexIssuesOnProjectIdAndId: migrated (0.0977s) ==========
Down
== 20210805102538 AddIndexIssuesOnProjectIdAndId: reverting ===================
-- transaction_open?()
-> 0.0000s
-- indexes(:issues)
-> 0.0150s
-- execute("SET statement_timeout TO 0")
-> 0.0019s
-- remove_index(:issues, {:algorithm=>:concurrently, :name=>"index_issues_on_project_id_and_id"})
-> 0.0048s
-- execute("RESET ALL")
-> 0.0006s
== 20210805102538 AddIndexIssuesOnProjectIdAndId: reverted (0.0259s) ==========
Execution Plan
- First run - 2.8 seconds https://gitlab.slack.com/archives/CLJMDRD8C/p1628167174150600
Limit (cost=0.57..1310.60 rows=1000 width=1326) (actual time=22.856..2832.857 rows=1000 loops=1)
Buffers: shared hit=10 read=981 dirtied=2
I/O Timings: read=2816.829 write=0.000
-> Index Scan using index_issues_on_project_id_and_id on public.issues (cost=0.57..124509.92 rows=95043 width=1326) (actual time=22.853..2831.966 rows=1000 loops=1)
Index Cond: (issues.project_id = 278964)
Filter: (issues.id <> ALL ('{1,2,3}'::integer[]))
Rows Removed by Filter: 0
Buffers: shared hit=10 read=981 dirtied=2
I/O Timings: read=2816.829 write=0.000
- Seconds run - 3 ms https://gitlab.slack.com/archives/CLJMDRD8C/p1628167268153100
Limit (cost=0.57..1310.60 rows=1000 width=1326) (actual time=0.042..2.450 rows=1000 loops=1)
Buffers: shared hit=991
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_issues_on_project_id_and_id on public.issues (cost=0.57..124509.92 rows=95043 width=1326) (actual time=0.039..2.309 rows=1000 loops=1)
Index Cond: (issues.project_id = 278964)
Filter: (issues.id <> ALL ('{1,2,3}'::integer[]))
Rows Removed by Filter: 0
Buffers: shared hit=991
I/O Timings: read=0.000 write=0.000
Screenshots or Screencasts (strongly suggested)
How to setup and validate locally (strongly suggested)
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team