Group Security Dashboard shows incorrect vulnerability counts
Summary
We are evaluating vulnerabilities in our analyzer projects and totals on the dashboard don't line up with the project counts (19 vs. ~100 critical shown below).
Steps to reproduce
- Navigate to a project level security dashboard (example)
- Notice that the count # in the
Vulnerabilities over time
is much lower than the counts per project inProject security status
(expand theF
grade)
Example Project
https://gitlab.com/groups/gitlab-org/security-products/analyzers/-/security/dashboard
What is the current bug behavior?
Timeline counts are much lower than the actual project counts.
What is the expected correct behavior?
Timeline counts should match project counts.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
In https://gitlab.slack.com/archives/CV09DAXEW/p1598556709070100 it was determined that this aggregate method may be the cause https://gitlab.com/gitlab-org/gitlab/-/blob/8d0dff57d77883aad64ec4e3a3814f602ec6d89b/ee/app/models/vulnerabilities/historical_statistic.rb#L28-34
Per @minac:
The above logic assumes that we have historical statistic entries in the table for all the projects for each day which might not correct based on the implementation. If there is no record for a date then the previous date entry should be used in the sum.
Implementation plan
The root cause of this issue might be the query timeout exceptions we get in Vulnerabilities::Statistics::ScheduleWorker#L19
.
When we get an exception in this worker, the historical vulnerability statistics will not be generated for that day which causes wrong results to be calculated while aggregating the historical statistics for a group as there are gaps in the time series.
-
-
Find a better performance query to prevent having timeout issues or introduce has_vulnerabilities
flag for projects and get the list of projects with the following query;
SELECT * FROM projects WHERE has_vulnerabilities IS TRUE
-
-
-
Write a migration to populate missing historical vulnerability statistics
-
-
- [-]
Since the daily adjustment worker can fail for various reasons, it's better to have a more robust way of aggregating the values so maybe we can change the query with something like proposed in the #244380 (comment 406700787)
- [-]