Remove ci_builds join from Security::Finding.by_build_ids
What does this MR do and why?
This MR removes a join to the ci_builds
table from security_scans
because it doesn't appear to be needed.
We only use the primary key from ci_builds
in our query, so we can query build_id
directly on security_scans
.
This query is a problem for two Engineering Allocation issues I'm working on: https://gitlab.com/gitlab-org/gitlab/-/issues/24644 (In progress: !71342 (comment 689569275)) and #340256 (closed)
It'll also be a ~"sharding-blocker", either for &6379 (closed) or &6373 (closed). Either way it seems like a nontrivial ~performance improvement as well.
Current Query
scope :by_build_ids, -> (build_ids) { joins(scan: :build).where(ci_builds: { id: build_ids }) }
SELECT "security_findings".* FROM "security_findings" INNER JOIN "security_scans" ON "security_scans"."id" = "security_findings"."scan_id" INNER JOIN "ci_builds" ON "ci_builds"."id" = "security_scans"."build_id" AND "ci_builds"."type" = 'Ci::Build' WHERE "ci_builds"."id" IN (1, 2, 3)
Execution plan: https://gitlab.slack.com/archives/CLJMDRD8C/p1633126596130300
Nested Loop (cost=1.71..2040.23 rows=1 width=106) (actual time=5.960..5.962 rows=0 loops=1)
Buffers: shared hit=24 read=4
I/O Timings: read=5.876 write=0.000
-> Nested Loop (cost=1.14..19.75 rows=1 width=8) (actual time=5.960..5.961 rows=0 loops=1)
Buffers: shared hit=24 read=4
I/O Timings: read=5.876 write=0.000
-> Index Scan using ci_builds_pkey on public.ci_builds (cost=0.58..8.99 rows=3 width=8) (actual time=5.931..5.940 rows=3 loops=1)
Index Cond: (ci_builds.id = ANY ('{1,2,3}'::bigint[]))
Filter: ((ci_builds.type)::text = 'Ci::Build'::text)
Rows Removed by Filter: 0
Buffers: shared hit=12 read=4
I/O Timings: read=5.876 write=0.000
-> Index Scan using idx_security_scans_on_build_and_scan_type on public.security_scans (cost=0.56..3.58 rows=1 width=16) (actual time=0.005..0.005 rows=0 loops=3)
Index Cond: (security_scans.build_id = ci_builds.id)
Buffers: shared hit=12
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_security_findings_on_scan_id_and_position on public.security_findings (cost=0.57..1995.17 rows=2531 width=106) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (security_findings.scan_id = security_scans.id)
I/O Timings: read=0.000 write=0.000
New Query
scope :by_build_ids, -> (build_ids) { joins(:scan).where(security_scans: { build_id: build_ids }) }
SELECT "security_findings".* FROM "security_findings" INNER JOIN "security_scans" ON "security_scans"."id" = "security_findings"."scan_id" WHERE "security_scans"."build_id" IN (1, 2, 3)
Execution plan: https://gitlab.slack.com/archives/CLJMDRD8C/p1633126613133300
Nested Loop (cost=1.13..7506.35 rows=92 width=106) (actual time=0.053..0.054 rows=0 loops=1)
Buffers: shared hit=15
I/O Timings: read=0.000 write=0.000
-> Index Scan using idx_security_scans_on_build_and_scan_type on public.security_scans (cost=0.56..10.00 rows=3 width=8) (actual time=0.051..0.052 rows=0 loops=1)
Index Cond: (security_scans.build_id = ANY ('{1,2,3}'::bigint[]))
Buffers: shared hit=15
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_security_findings_on_scan_id_and_position on public.security_findings (cost=0.57..2473.48 rows=2531 width=106) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: (security_findings.scan_id = security_scans.id)
I/O Timings: read=0.000 write=0.000
How to set up and validate locally
This is more of a validate-in-database-lab kind of thing, since the production database is where the effect of this will be seen.
I haven't changed any specs, so there shouldn't be any behavioral differences.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.