Skip to content

Add compound index for `vulnerabilities` table on `project_id` and `id`

What does this MR do and why?

Related to #350915 (closed), and !78960 (merged).

Database review

This MR introduces a new compound index for the vulnerabilities table on project_id and id columns to speed up the SQL queries generated by the EachBatch module.

rake db:migrate:up
== 20220125230538 AddCompoundIndexOnProjectIdAndIdForVulnerabilities: migrating
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:vulnerabilities, [:project_id, :id], {:name=>"index_vulnerabilities_on_project_id_and_id", :algorithm=>:concurrently})
   -> 0.0057s
-- add_index(:vulnerabilities, [:project_id, :id], {:name=>"index_vulnerabilities_on_project_id_and_id", :algorithm=>:concurrently})
   -> 0.0047s
== 20220125230538 AddCompoundIndexOnProjectIdAndIdForVulnerabilities: migrated (0.0116s)
rake db:migrate:down
== 20220125230538 AddCompoundIndexOnProjectIdAndIdForVulnerabilities: reverting
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:vulnerabilities, [:project_id, :id], {:name=>"index_vulnerabilities_on_project_id_and_id", :algorithm=>:concurrently})
   -> 0.0064s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- remove_index(:vulnerabilities, {:name=>"index_vulnerabilities_on_project_id_and_id", :algorithm=>:concurrently, :column=>[:project_id, :id]})
   -> 0.0080s
-- execute("RESET statement_timeout")
   -> 0.0005s
== 20220125230538 AddCompoundIndexOnProjectIdAndIdForVulnerabilities: reverted (0.0188s)

Example queries used by the EachBatch module

Get the first vulnerability ID
SELECT
    "vulnerabilities"."id"
FROM
    "vulnerabilities"
WHERE
    "vulnerabilities"."project_id" = 278964
ORDER BY
    "vulnerabilities"."id" ASC
LIMIT 1

Before the index;

https://gitlab.slack.com/archives/CLJMDRD8C/p1643153060294400

Limit  (cost=0.44..29.35 rows=1 width=8) (actual time=4.948..4.950 rows=1 loops=1)
   Buffers: shared read=4
   I/O Timings: read=4.866 write=0.000
   ->  Index Scan using vulnerabilities_pkey on public.vulnerabilities  (cost=0.44..2381387.85 rows=82370 width=8) (actual time=4.945..4.946 rows=1 loops=1)
         Filter: (vulnerabilities.project_id = 278964)
         Rows Removed by Filter: 0
         Buffers: shared read=4
         I/O Timings: read=4.866 write=0.000

After the index;

https://gitlab.slack.com/archives/CLJMDRD8C/p1643153457300500

Limit  (cost=0.56..0.67 rows=1 width=8) (actual time=0.188..0.189 rows=1 loops=1)
   Buffers: shared hit=4 read=4
   I/O Timings: read=0.071 write=0.000
   ->  Index Only Scan using index_vulnerabilities_on_project_id_and_id on public.vulnerabilities  (cost=0.56..8741.89 rows=82009 width=8) (actual time=0.187..0.187 rows=1 loops=1)
         Index Cond: (vulnerabilities.project_id = 278964)
         Heap Fetches: 0
         Buffers: shared hit=4 read=4
         I/O Timings: read=0.071 write=0.000
Get the IDs of the next 1000 vulnerabilities
SELECT
    "vulnerabilities"."id"
FROM
    "vulnerabilities"
WHERE
    "vulnerabilities"."project_id" = 278964
    AND "vulnerabilities"."id" >= 1
ORDER BY
    "vulnerabilities"."id" ASC
LIMIT 1 OFFSET 1000

Before the index;

https://gitlab.slack.com/archives/CLJMDRD8C/p1643153157296900

Limit  (cost=28288.53..28315.81 rows=1 width=8) (actual time=104755.558..105109.059 rows=1 loops=1)
   Buffers: shared hit=1729611 read=379726 dirtied=15139 written=385
   I/O Timings: read=259374.607 write=18.595
   ->  Gather Merge  (cost=1000.46..2248718.36 rows=82370 width=8) (actual time=25.852..105108.855 rows=1001 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=1729611 read=379726 dirtied=15139 written=385
         I/O Timings: read=259374.607 write=18.595
         ->  Parallel Index Scan using vulnerabilities_pkey on public.vulnerabilities  (cost=0.44..2238210.80 rows=34321 width=8) (actual time=7.356..88541.727 rows=336 loops=3)
               Index Cond: (vulnerabilities.id >= 1)
               Filter: (vulnerabilities.project_id = 278964)
               Rows Removed by Filter: 980598
               Buffers: shared hit=1729611 read=379726 dirtied=15139 written=385
               I/O Timings: read=259374.607 write=18.595

After the index;

https://gitlab.slack.com/archives/CLJMDRD8C/p1643153483303000

Limit  (cost=109.65..109.76 rows=1 width=8) (actual time=0.787..0.788 rows=1 loops=1)
   Buffers: shared hit=205 read=8
   I/O Timings: read=0.188 write=0.000
   ->  Index Only Scan using index_vulnerabilities_on_project_id_and_id on public.vulnerabilities  (cost=0.56..8946.91 rows=82009 width=8) (actual time=0.063..0.732 rows=1001 loops=1)
         Index Cond: ((vulnerabilities.project_id = 278964) AND (vulnerabilities.id >= 1))
         Heap Fetches: 32
         Buffers: shared hit=205 read=8
         I/O Timings: read=0.188 write=0.000

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Mehmet Emin INAC

Merge request reports

Loading