Sort vulnerability list by primary identifier
What does this MR do?
This MR allows users to sort the vulnerability list by the Primary Identifier.
Related to #258842 (closed)
Database Review
This MR introduces one migration to create 3 indices to speed up the sorting by primary_identifier relation of vulnerability_occurrences by just doing Index Only Scan
to avoid reading from disk.
rake db:migrate:up
== 20201207173651 AddPrimaryIdentifierSortingIndices: migrating ===============
-- transaction_open?()
-> 0.0000s
-- index_exists?(:vulnerability_occurrences, [:primary_identifier_id, :vulnerability_id], {:name=>"index_vulnerability_occurrences_on_identifier_and_vulnerability", :algorithm=>:concurrently})
-> 0.0037s
-- add_index(:vulnerability_occurrences, [:primary_identifier_id, :vulnerability_id], {:name=>"index_vulnerability_occurrences_on_identifier_and_vulnerability", :algorithm=>:concurrently})
-> 0.0044s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:vulnerability_identifiers, [:project_id, :external_id, :id], {:order=>{:external_id=>:asc, :id=>:desc}, :name=>"index_vulnerability_identifiers_sorted_asc_by_external_id", :algorithm=>:concurrently})
-> 0.0011s
-- add_index(:vulnerability_identifiers, [:project_id, :external_id, :id], {:order=>{:external_id=>:asc, :id=>:desc}, :name=>"index_vulnerability_identifiers_sorted_asc_by_external_id", :algorithm=>:concurrently})
-> 0.0020s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:vulnerability_identifiers, [:project_id, :external_id, :id], {:order=>{:external_id=>:desc, :id=>:desc}, :name=>"index_vulnerability_identifiers_sorted_desc_by_external_id", :algorithm=>:concurrently})
-> 0.0012s
-- add_index(:vulnerability_identifiers, [:project_id, :external_id, :id], {:order=>{:external_id=>:desc, :id=>:desc}, :name=>"index_vulnerability_identifiers_sorted_desc_by_external_id", :algorithm=>:concurrently})
-> 0.0028s
== 20201207173651 AddPrimaryIdentifierSortingIndices: migrated (0.0169s) ======
rake db:rollback
== 20201207173651 AddPrimaryIdentifierSortingIndices: reverting ===============
-- transaction_open?()
-> 0.0000s
-- indexes(:vulnerability_occurrences)
-> 0.0041s
-- remove_index(:vulnerability_occurrences, {:algorithm=>:concurrently, :name=>"index_vulnerability_occurrences_on_identifier_and_vulnerability"})
-> 0.0031s
-- transaction_open?()
-> 0.0000s
-- indexes(:vulnerability_identifiers)
-> 0.0014s
-- remove_index(:vulnerability_identifiers, {:algorithm=>:concurrently, :name=>"index_vulnerability_identifiers_sorted_asc_by_external_id"})
-> 0.0010s
-- transaction_open?()
-> 0.0000s
-- indexes(:vulnerability_identifiers)
-> 0.0012s
-- remove_index(:vulnerability_identifiers, {:algorithm=>:concurrently, :name=>"index_vulnerability_identifiers_sorted_desc_by_external_id"})
-> 0.0010s
== 20201207173651 AddPrimaryIdentifierSortingIndices: reverted (0.0136s) ======
An example query which will utilize these new indices;
SELECT
"vulnerabilities".*
FROM "vulnerabilities"
INNER JOIN "vulnerability_occurrences" ON "vulnerability_occurrences"."vulnerability_id" = "vulnerabilities"."id"
INNER JOIN "vulnerability_identifiers" ON "vulnerability_identifiers"."id" = "vulnerability_occurrences"."primary_identifier_id"
WHERE
"vulnerabilities"."project_id" = 278964 AND
"vulnerability_identifiers"."project_id" = "vulnerabilities"."project_id" AND -- This is necessary to avoid scanning the whole index
"vulnerabilities"."state" IN (1, 4)
ORDER BY
"vulnerability_identifiers"."external_id" DESC,
"vulnerability_identifiers"."id" DESC
LIMIT 20
And the query plan for this;
Limit (cost=1.29..459.50 rows=4 width=304) (actual time=0.089..0.281 rows=20 loops=1)
Buffers: shared hit=108
-> Nested Loop (cost=1.29..459.50 rows=4 width=304) (actual time=0.088..0.275 rows=20 loops=1)
Buffers: shared hit=108
-> Nested Loop (cost=0.86..299.13 rows=328 width=36) (actual time=0.062..0.085 rows=20 loops=1)
Buffers: shared hit=28
-> Index Only Scan using test_index_ordered_5 on public.vulnerability_identifiers (cost=0.43..5.62 rows=66 width=28) (actual time=0.022..0.023 rows=2 loops=1)
Index Cond: (vulnerability_identifiers.project_id = 278964)
Heap Fetches: 0
Buffers: shared hit=4
-> Index Only Scan using test_index_ordered_4 on public.vulnerability_occurrences (cost=0.43..3.85 rows=60 width=16) (actual time=0.017..0.026 rows=10 loops=2)
Index Cond: (vulnerability_occurrences.primary_identifier_id = vulnerability_identifiers.id)
Heap Fetches: 0
Buffers: shared hit=24
-> Index Scan using vulnerabilities_pkey on public.vulnerabilities (cost=0.43..0.49 rows=1 width=280) (actual time=0.008..0.008 rows=1 loops=20)
Index Cond: (vulnerabilities.id = vulnerability_occurrences.vulnerability_id)
Filter: ((vulnerabilities.state = ANY ('{1,4}'::integer[])) AND (vulnerabilities.project_id = 278964))
Rows Removed by Filter: 0
Buffers: shared hit=80
https://explain.depesz.com/s/r8wp
Without the indices in place, it takes around ~400ms to run the above query with the following query plan;
Limit (cost=3946.70..3946.71 rows=4 width=304) (actual time=358.328..358.335 rows=20 loops=1)
Buffers: shared hit=317269
-> Sort (cost=3946.70..3946.71 rows=4 width=304) (actual time=358.326..358.330 rows=20 loops=1)
Sort Key: vulnerability_identifiers.external_id DESC, vulnerability_identifiers.id DESC
Sort Method: top-N heapsort Memory: 34kB
Buffers: shared hit=317269
-> Nested Loop (cost=1.29..3946.66 rows=4 width=304) (actual time=0.077..314.453 rows=58546 loops=1)
Buffers: shared hit=317269
-> Nested Loop (cost=0.86..3786.79 rows=327 width=36) (actual time=0.056..89.295 rows=64637 loops=1)
Buffers: shared hit=63196
-> Index Scan using index_vulnerability_identifiers_on_project_id_and_fingerprint on public.vulnerability_identifiers (cost=0.43..72.32 rows=65 width=28) (actual time=0.011..0.272 rows=256 loops=1)
Index Cond: (vulnerability_identifiers.project_id = 278964)
Buffers: shared hit=259
-> Index Scan using index_vulnerability_occurrences_on_primary_identifier_id on public.vulnerability_occurrences (cost=0.43..56.55 rows=60 width=16) (actual time=0.005..0.287 rows=252 loops=256)
Index Cond: (vulnerability_occurrences.primary_identifier_id = vulnerability_identifiers.id)
Buffers: shared hit=62937
-> Index Scan using vulnerabilities_pkey on public.vulnerabilities (cost=0.43..0.49 rows=1 width=280) (actual time=0.003..0.003 rows=1 loops=64637)
Index Cond: (vulnerabilities.id = vulnerability_occurrences.vulnerability_id)
Filter: ((vulnerabilities.state = ANY ('{1,4}'::integer[])) AND (vulnerabilities.project_id = 278964))
Rows Removed by Filter: 0
Buffers: shared hit=254073
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
- [-] Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
- [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Mehmet Emin INAC