Create migration to schedule re-population of historical stats
What does this MR do?
This MR introduces a post-deployment migration to schedule re-population of the historical vulnerability statistics.
Related to #244380 (closed)
Database migration
Post-deployment migration
The following migration schedules the background jobs;
rake db:migrate:up
== 20201119092319 ScheduleRepopulateHistoricalVulnerabilityStatistics: migrating
== 20201119092319 ScheduleRepopulateHistoricalVulnerabilityStatistics: migrated (0.0622s)
rake db:migrate:down
== 20201119092319 ScheduleRepopulateHistoricalVulnerabilityStatistics: reverting
== 20201119092319 ScheduleRepopulateHistoricalVulnerabilityStatistics: reverted (0.0000s)
Query & timing analysis
Post-deployment migration
The post-deployment migration will schedule background jobs for all the projects which have vulnerabilities by using the following query;
SELECT
"project_settings"."project_id"
FROM
"project_settings"
WHERE (has_vulnerabilities IS TRUE)
AND "project_settings"."project_id" >= 1
ORDER BY
"project_settings"."project_id" ASC
LIMIT $1 OFFSET $2
Which has the following query & execution plan(https://explain.depesz.com/s/4rzh);
Limit (cost=0.31..1.64 rows=50 width=4) (actual time=0.240..3.403 rows=50 loops=1)
Buffers: shared hit=4 read=6 dirtied=2
I/O Timings: read=3.336
-> Index Only Scan using index_project_settings_on_project_id_partially on public.project_settings (cost=0.29..39.88 rows=1491 width=4) (actual time=0.228..3.394 rows=51 loops=1)
Index Cond: (project_settings.project_id >= 1)
Heap Fetches: 2
Buffers: shared hit=4 read=6 dirtied=2
I/O Timings: read=3.336
As of writing this, there are 10590 projects to be migrated which makes it around ~212 iterations to be run. Each iteration should take around 30ms(25ms query time + 5ms scheduling time) which makes it a total of ~7s.
Background jobs
As already mentioned in the above section, there will be around ~212 background jobs each of which will populate the historical vulnerability statistics for 50 projects for the past 365 days. The background job has already been implemented before and used twice in the past(!40766 (merged)). Therefore the timing analysis will be based on the previous ones.
In total, we will have ~212 jobs to be executed with 5 minutes of delay between each which makes it in total ~18h until all the jobs get finished.
The delay of 5 minutes between each job is chosen based on the following estimations;
- Fetching the statistics for a single project
The query to fetch the statistics for a single project for the 10 days window is as following;
SELECT DATE(calendar.entry) AS day,
severity,
COUNT(*)
FROM generate_series(
DATE '2020-07-20',
DATE '2020-07-29',
INTERVAL '1 day'
) as calendar(entry)
INNER JOIN vulnerabilities ON vulnerabilities.created_at <= calendar.entry
WHERE "vulnerabilities"."project_id" = 278964
AND (
(
vulnerabilities.dismissed_at IS NULL
OR vulnerabilities.dismissed_at > calendar.entry
)
AND (
vulnerabilities.resolved_at IS NULL
OR vulnerabilities.resolved_at > calendar.entry
)
)
GROUP BY "day",
"severity";
https://explain.dalibo.com/plan/FLW (~55ms)
- Inserting the statistics for a single project in one query
The following query inserts the statistics for a single project for the past 365 days;
INSERT INTO "vulnerability_historical_statistics" (
"letter_grade",
"created_at",
"updated_at",
"project_id",
"total",
"critical",
"high",
"medium",
"low",
"unknown",
"info",
"date"
)
VALUES (
1,
'2020-07-30 07:23:26.813637',
'2020-07-30 07:23:26.813637',
10,
1,
0,
0,
0,
1,
0,
0,
'2020-05-01'
),
...,
(
1,
'2020-07-30 07:23:26.813637',
'2020-07-30 07:23:26.813637',
10,
91,
0,
0,
0,
91,
0,
0,
'2020-07-30'
) ON CONFLICT DO NOTHING
RETURNING "id"
https://explain.depesz.com/s/Eael (~350ms)
Based on these numbers, the total can be calculated with the following formula;
ETA = X1 * 50
- X1 = Time spent on one project = X2 + X3
-
X2 = Time to fetch all statistics for a single project = X21 * 38
- X21 = Time to fetch statistics for a single project for 10 days window = 55ms
- X3 = Time to insert all statistics for a single project = 350ms
ETA = ((55ms * 38) + 350ms) * 50 = 122s
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides - [-] Separation of EE specific content
Availability and Testing
- [-] Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
- [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team