Resolve "Clean up orphaned remediations"
What does this MR do and why?
This adds a basic sidekiq job the deletes Vulnerabilities::Remediation
s that do not have any findings
The process for removing remediations
in a pipeline leaves these
remediations
in the database without any findings
associated with
them. As a result, these remediations become orphaned in the database.
It was discovered that ~40% of all the remediations
in the production
database are actually orphaned.
SQL
from `Vulnerabilities::Remediation.where.missing(:findings).to_sql`
SELECT
'vulnerability_remediations'.*
FROM
'vulnerability_remediations'
LEFT OUTER JOIN 'vulnerability_findings_remediations' ON 'vulnerability_findings_remediations'.'vulnerability_remediation_id' = 'vulnerability_remediations'.'id'
LEFT OUTER JOIN 'vulnerability_occurrences' ON 'vulnerability_occurrences'.'id' = 'vulnerability_findings_remediations'.'vulnerability_occurrence_id'
WHERE
'vulnerability_occurrences'.'id' IS NULL
a single batch in `each_batch`
SELECT
vulnerability_remediations.*
FROM
vulnerability_remediations
LEFT JOIN vulnerability_findings_remediations ON vulnerability_findings_remediations.vulnerability_remediation_id = vulnerability_remediations.id
LEFT JOIN vulnerability_occurrences ON vulnerability_occurrences.id = vulnerability_findings_remediations.vulnerability_occurrence_id
WHERE
vulnerability_occurrences.id IS NULL AND
vulnerability_remediations.id >= 16045 AND
vulnerability_remediations.id < 17045;
EXPLAIN
explain from a batch in each_batch
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/20374/commands/66553
Time: 4.774 ms
- planning: 1.768 ms
- execution: 3.006 ms
- I/O read: 0.000 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 1303 (~10.20 MiB) from the buffer pool
- reads: 0 from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
How to set up and validate locally
-
in a separate terminal window, tail the sidekiq logs
tail -f log/sidekiq*| grep --line-buffered 'Vulnerabilities::OrphanedRemediationsCleanupWorker' | jq -cR 'fromjson?'|jq
-
use
FactoryBot
to create a bunch of remediations# with_findings FactoryBot.create_list(:vulnerabilities_remediation, 4, findings: FactoryBot.create_list(:vulnerabilities_finding, 1));nil # without_findings FactoryBot.create_list(:vulnerabilities_remediation, 4, findings: [], project: Project.first);nil Vulnerabilities::Remediation.count # => 8
-
modify the cron schedule to run every minute
diff --git a/config/initializers/1_settings.rb b/config/initializers/1_settings.rb index befbe06033b8..6e66be06b9ad 100644 --- a/config/initializers/1_settings.rb +++ b/config/initializers/1_settings.rb @@ -812 +812 @@ - Settings.cron_jobs['vulnerability_orphaned_remediations_cleanup_worker']['cron'] ||= '15 3 */3 * *' + Settings.cron_jobs['vulnerability_orphaned_remediations_cleanup_worker']['cron'] ||= '*/1 * * * *'
-
gdk restart
-
wait for the job to run, you should see logs like the following:
Click to expand
{ "severity": "INFO", "time": "2023-07-11T22:21:02.288Z", "retry": 0, "queue": "default", "backtrace": true, "version": 0, "queue_namespace": "cronjob", "args": [], "class": "Vulnerabilities::OrphanedRemediationsCleanupWorker", "jid": "82ac3276385963276c6719f2", "created_at": "2023-07-11T22:21:02.274Z", "meta.caller_id": "Cronjob", "correlation_id": "3587ca2199e15da531670dbc5b468fb5", "meta.root_caller_id": "Cronjob", "meta.feature_category": "vulnerability_management", "worker_data_consistency": "always", "idempotency_key": "resque:gitlab:duplicate:default:e28051869e793708bf6d7f453756748b3a630753ba8c4a71b23a17d4b48c80bb", "size_limiter": "validated", "enqueued_at": "2023-07-11T22:21:02.275Z", "job_size_bytes": 2, "pid": 482581, "message": "Vulnerabilities::OrphanedRemediationsCleanupWorker JID-82ac3276385963276c6719f2: start", "job_status": "start", "scheduling_latency_s": 0.013081 } { "severity": "INFO", "time": "2023-07-11T22:21:02.405Z", "retry": 0, "queue": "default", "backtrace": true, "version": 0, "queue_namespace": "cronjob", "args": [], "class": "Vulnerabilities::OrphanedRemediationsCleanupWorker", "jid": "82ac3276385963276c6719f2", "created_at": "2023-07-11T22:21:02.274Z", "meta.caller_id": "Cronjob", "correlation_id": "3587ca2199e15da531670dbc5b468fb5", "meta.root_caller_id": "Cronjob", "meta.feature_category": "vulnerability_management", "worker_data_consistency": "always", "idempotency_key": "resque:gitlab:duplicate:default:e28051869e793708bf6d7f453756748b3a630753ba8c4a71b23a17d4b48c80bb", "size_limiter": "validated", "enqueued_at": "2023-07-11T22:21:02.275Z", "job_size_bytes": 2, "pid": 482581, "message": "Vulnerabilities::OrphanedRemediationsCleanupWorker JID-82ac3276385963276c6719f2: done: 0.117322 sec", "job_status": "done", "scheduling_latency_s": 0.013081, "redis_calls": 2, "redis_duration_s": 0.00131, "redis_read_bytes": 2, "redis_write_bytes": 186, "redis_queues_calls": 2, "redis_queues_duration_s": 0.00131, "redis_queues_read_bytes": 2, "redis_queues_write_bytes": 186, "db_count": 1, "db_write_count": 1, "db_cached_count": 0, "db_replica_count": 0, "db_primary_count": 1, "db_main_count": 1, "db_ci_count": 0, "db_main_replica_count": 0, "db_ci_replica_count": 0, "db_replica_cached_count": 0, "db_primary_cached_count": 0, "db_main_cached_count": 0, "db_ci_cached_count": 0, "db_main_replica_cached_count": 0, "db_ci_replica_cached_count": 0, "db_replica_wal_count": 0, "db_primary_wal_count": 0, "db_main_wal_count": 0, "db_ci_wal_count": 0, "db_main_replica_wal_count": 0, "db_ci_replica_wal_count": 0, "db_replica_wal_cached_count": 0, "db_primary_wal_cached_count": 0, "db_main_wal_cached_count": 0, "db_ci_wal_cached_count": 0, "db_main_replica_wal_cached_count": 0, "db_ci_replica_wal_cached_count": 0, "db_replica_duration_s": 0, "db_primary_duration_s": 0.003, "db_main_duration_s": 0.003, "db_ci_duration_s": 0, "db_main_replica_duration_s": 0, "db_ci_replica_duration_s": 0, "cpu_s": 0.003803, "worker_id": "sidekiq_0", "rate_limiting_gates": [], "duration_s": 0.117322, "completed_at": "2023-07-11T22:21:02.405Z", "load_balancing_strategy": "primary", "db_duration_s": 0.001666, "urgency": "low", "target_duration_s": 300 } { "severity": "INFO", "time": "2023-07-11T22:22:05.451Z", "retry": 0, "queue": "default", "backtrace": true, "version": 0, "queue_namespace": "cronjob", "args": [], "class": "Vulnerabilities::OrphanedRemediationsCleanupWorker", "jid": "f48afdaa29493f6851f60bfd", "created_at": "2023-07-11T22:22:05.438Z", "meta.caller_id": "Cronjob", "correlation_id": "f2f5b759df6e60f76add1a3c29c41c48", "meta.root_caller_id": "Cronjob", "meta.feature_category": "vulnerability_management", "worker_data_consistency": "always", "idempotency_key": "resque:gitlab:duplicate:default:e28051869e793708bf6d7f453756748b3a630753ba8c4a71b23a17d4b48c80bb", "size_limiter": "validated", "enqueued_at": "2023-07-11T22:22:05.439Z", "job_size_bytes": 2, "pid": 482581, "message": "Vulnerabilities::OrphanedRemediationsCleanupWorker JID-f48afdaa29493f6851f60bfd: start", "job_status": "start", "scheduling_latency_s": 0.012039 }
-
see if all the remediations you created are deleted
Vulnerabilities::Remediation.count # => 4
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #378269 (closed)