Drop Vulnerabilities without `finding_id`
What does this MR do and why?
tl;dr rows in vulnerabilities
without finding_id
indicate invalid data and we need to remove them
When working on Proper 1:1 relationship between Vulnerabilities... (&11030 - closed) we had to backfill the finding_id
data based on our vulnerability_ocurrences
table. After the process finished we realized we have exactly 6300 row for which finding_id
could not be populated. This indicates invalid data and we need to remove those rows before validating the finding_id
foreign key. The latest 10 invalid records were created back in April 2022 so I'm sure we're no longer creating invalid records
SQL
gitlabhq_production=> SELECT created_at FROM vulnerabilities WHERE finding_id IS NULL ORDER BY created_at DESC LIMIT 10;
created_at
-------------------------------
2022-04-20 20:34:08.328467+00
2022-04-20 20:33:43.124563+00
2022-04-20 20:33:41.319695+00
2022-04-20 20:33:33.750897+00
2022-04-20 20:33:06.104165+00
2022-04-20 20:32:21.725177+00
2022-04-20 20:32:14.516929+00
2022-04-20 20:32:07.394365+00
2022-04-20 20:31:40.921779+00
2022-04-20 20:31:38.651027+00
To paint a complete picture – we actually have far more orphaned rows in vulnerability_occurrences
table:
SQL
gitlabhq_production=> SELECT COUNT(*) FROM vulnerability_occurrences WHERE vulnerability_id IS NULL;
count
---------
1305804
(1 row)
However it's okay to keep them there for now. We will drop them once the migration to 1:1 relationship is complete.
Related to #436418 (closed)