Implement retention period for `security_findings` records
What does this MR do and why?
This MR implements the retention period for the records stored in the security_findings
table. Currently, we are creating ~50K security_scans
records per day that's why I choose iterating over 100K records per day to have a safety buffer just in case there is a spike for a day.
This buffer will also be used to remove the records created before we implement the retention period(Though I will probably go ahead and create another MR to purge them as it will probably take quite long for this cronjob to catch up with all the records).
Related to Implement retention period for Security::Findin... (#351524 - closed).
Database review
In this MR, we are iterating through records to prevent loading 100K records into memory at once by using the ActiveRecord::Batches::BatchEnumerator#find_in_batches
method. EachBatch
can not be used as it does not work with the general limiting.
The query to iterate through all the records is;
SELECT
"security_scans".*
FROM
"security_scans"
WHERE ("security_scans"."status" = 1
OR "security_scans"."status" = 5)
AND (created_at < '2021-12-09 19:22:48.869139')
ORDER BY
"security_scans"."id" ASC
LIMIT 1000
And for the next iterations;
SELECT
"security_scans".*
FROM
"security_scans"
WHERE ("security_scans"."status" = 1
OR "security_scans"."status" = 5)
AND (created_at < '2021-12-09 19:24:18.772434')
AND "security_scans"."id" >...
ORDER BY
"security_scans"."id" ASC
LIMIT 1000
Execution plan(https://gitlab.slack.com/archives/CLJMDRD8C/p1646854171268589);
Limit (cost=0.44..98.80 rows=1000 width=58) (actual time=0.033..0.961 rows=1000 loops=1)
Buffers: shared hit=100
I/O Timings: read=0.000 write=0.000
-> Index Scan using security_scans_pkey on public.security_scans (cost=0.44..1343516.65 rows=13659487 width=58) (actual time=0.032..0.796 rows=1000 loops=1)
Filter: ((security_scans.created_at < '2021-12-09 19:22:48.869139+00'::timestamp with time zone) AND ((security_scans.status = 1) OR (security_scans.status = 5)))
Rows Removed by Filter: 0
Buffers: shared hit=100
I/O Timings: read=0.000 write=0.000
The above plan looks quite good and didn't improve any with introducing the following partial index but maybe this is related to the data cardinality and we may need to introduce this index anyway;
CREATE INDEX stale_security_scans ON security_scans USING btree (created_at, id)
WHERE (security_scans.status = 1 OR security_scans.status = 5);
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.