Skip to content

Resolve "Clean up orphaned remediations"

Michael Becker requested to merge 378269-clean-up-orphaned-remediations into master

What does this MR do and why?

This adds a basic sidekiq job the deletes Vulnerabilities::Remediations that do not have any findings

The process for removing remediations in a pipeline leaves these remediations in the database without any findings associated with them. As a result, these remediations become orphaned in the database.

It was discovered that ~40% of all the remediations in the production database are actually orphaned.

SQL

from `Vulnerabilities::Remediation.where.missing(:findings).to_sql`
SELECT

    'vulnerability_remediations'.*

FROM

    'vulnerability_remediations'

    LEFT OUTER JOIN 'vulnerability_findings_remediations' ON 'vulnerability_findings_remediations'.'vulnerability_remediation_id' = 'vulnerability_remediations'.'id'

    LEFT OUTER JOIN 'vulnerability_occurrences' ON 'vulnerability_occurrences'.'id' = 'vulnerability_findings_remediations'.'vulnerability_occurrence_id'

WHERE

    'vulnerability_occurrences'.'id' IS NULL
a single batch in `each_batch`
SELECT

    vulnerability_remediations.*

FROM

    vulnerability_remediations

    LEFT JOIN vulnerability_findings_remediations ON vulnerability_findings_remediations.vulnerability_remediation_id = vulnerability_remediations.id

    LEFT JOIN vulnerability_occurrences ON vulnerability_occurrences.id = vulnerability_findings_remediations.vulnerability_occurrence_id

WHERE

    vulnerability_occurrences.id IS NULL AND

    vulnerability_remediations.id >= 16045 AND

    vulnerability_remediations.id < 17045;

EXPLAIN

explain from a batch in each_batch

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/20374/commands/66553

Time: 4.774 ms
  - planning: 1.768 ms
  - execution: 3.006 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 1303 (~10.20 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

How to set up and validate locally

  1. in a separate terminal window, tail the sidekiq logs

     tail -f log/sidekiq*| grep --line-buffered 'Vulnerabilities::OrphanedRemediationsCleanupWorker' | jq -cR 'fromjson?'|jq
  2. use FactoryBot to create a bunch of remediations

    # with_findings
    FactoryBot.create_list(:vulnerabilities_remediation, 4, findings: FactoryBot.create_list(:vulnerabilities_finding, 1));nil
    # without_findings
    FactoryBot.create_list(:vulnerabilities_remediation, 4, findings: [], project: Project.first);nil
    
    Vulnerabilities::Remediation.count
    
    # => 8
  3. modify the cron schedule to run every minute

    diff --git a/config/initializers/1_settings.rb b/config/initializers/1_settings.rb
    index befbe06033b8..6e66be06b9ad 100644
    --- a/config/initializers/1_settings.rb
    +++ b/config/initializers/1_settings.rb
    @@ -812 +812 @@
    -  Settings.cron_jobs['vulnerability_orphaned_remediations_cleanup_worker']['cron'] ||= '15 3 */3 * *'
    +  Settings.cron_jobs['vulnerability_orphaned_remediations_cleanup_worker']['cron'] ||= '*/1 * * * *'
  4. gdk restart

  5. wait for the job to run, you should see logs like the following:

    Click to expand
    {
      "severity": "INFO",
      "time": "2023-07-11T22:21:02.288Z",
      "retry": 0,
      "queue": "default",
      "backtrace": true,
      "version": 0,
      "queue_namespace": "cronjob",
      "args": [],
      "class": "Vulnerabilities::OrphanedRemediationsCleanupWorker",
      "jid": "82ac3276385963276c6719f2",
      "created_at": "2023-07-11T22:21:02.274Z",
      "meta.caller_id": "Cronjob",
      "correlation_id": "3587ca2199e15da531670dbc5b468fb5",
      "meta.root_caller_id": "Cronjob",
      "meta.feature_category": "vulnerability_management",
      "worker_data_consistency": "always",
      "idempotency_key": "resque:gitlab:duplicate:default:e28051869e793708bf6d7f453756748b3a630753ba8c4a71b23a17d4b48c80bb",
      "size_limiter": "validated",
      "enqueued_at": "2023-07-11T22:21:02.275Z",
      "job_size_bytes": 2,
      "pid": 482581,
      "message": "Vulnerabilities::OrphanedRemediationsCleanupWorker JID-82ac3276385963276c6719f2: start",
      "job_status": "start",
      "scheduling_latency_s": 0.013081
    }
    {
      "severity": "INFO",
      "time": "2023-07-11T22:21:02.405Z",
      "retry": 0,
      "queue": "default",
      "backtrace": true,
      "version": 0,
      "queue_namespace": "cronjob",
      "args": [],
      "class": "Vulnerabilities::OrphanedRemediationsCleanupWorker",
      "jid": "82ac3276385963276c6719f2",
      "created_at": "2023-07-11T22:21:02.274Z",
      "meta.caller_id": "Cronjob",
      "correlation_id": "3587ca2199e15da531670dbc5b468fb5",
      "meta.root_caller_id": "Cronjob",
      "meta.feature_category": "vulnerability_management",
      "worker_data_consistency": "always",
      "idempotency_key": "resque:gitlab:duplicate:default:e28051869e793708bf6d7f453756748b3a630753ba8c4a71b23a17d4b48c80bb",
      "size_limiter": "validated",
      "enqueued_at": "2023-07-11T22:21:02.275Z",
      "job_size_bytes": 2,
      "pid": 482581,
      "message": "Vulnerabilities::OrphanedRemediationsCleanupWorker JID-82ac3276385963276c6719f2: done: 0.117322 sec",
      "job_status": "done",
      "scheduling_latency_s": 0.013081,
      "redis_calls": 2,
      "redis_duration_s": 0.00131,
      "redis_read_bytes": 2,
      "redis_write_bytes": 186,
      "redis_queues_calls": 2,
      "redis_queues_duration_s": 0.00131,
      "redis_queues_read_bytes": 2,
      "redis_queues_write_bytes": 186,
      "db_count": 1,
      "db_write_count": 1,
      "db_cached_count": 0,
      "db_replica_count": 0,
      "db_primary_count": 1,
      "db_main_count": 1,
      "db_ci_count": 0,
      "db_main_replica_count": 0,
      "db_ci_replica_count": 0,
      "db_replica_cached_count": 0,
      "db_primary_cached_count": 0,
      "db_main_cached_count": 0,
      "db_ci_cached_count": 0,
      "db_main_replica_cached_count": 0,
      "db_ci_replica_cached_count": 0,
      "db_replica_wal_count": 0,
      "db_primary_wal_count": 0,
      "db_main_wal_count": 0,
      "db_ci_wal_count": 0,
      "db_main_replica_wal_count": 0,
      "db_ci_replica_wal_count": 0,
      "db_replica_wal_cached_count": 0,
      "db_primary_wal_cached_count": 0,
      "db_main_wal_cached_count": 0,
      "db_ci_wal_cached_count": 0,
      "db_main_replica_wal_cached_count": 0,
      "db_ci_replica_wal_cached_count": 0,
      "db_replica_duration_s": 0,
      "db_primary_duration_s": 0.003,
      "db_main_duration_s": 0.003,
      "db_ci_duration_s": 0,
      "db_main_replica_duration_s": 0,
      "db_ci_replica_duration_s": 0,
      "cpu_s": 0.003803,
      "worker_id": "sidekiq_0",
      "rate_limiting_gates": [],
      "duration_s": 0.117322,
      "completed_at": "2023-07-11T22:21:02.405Z",
      "load_balancing_strategy": "primary",
      "db_duration_s": 0.001666,
      "urgency": "low",
      "target_duration_s": 300
    }
    {
      "severity": "INFO",
      "time": "2023-07-11T22:22:05.451Z",
      "retry": 0,
      "queue": "default",
      "backtrace": true,
      "version": 0,
      "queue_namespace": "cronjob",
      "args": [],
      "class": "Vulnerabilities::OrphanedRemediationsCleanupWorker",
      "jid": "f48afdaa29493f6851f60bfd",
      "created_at": "2023-07-11T22:22:05.438Z",
      "meta.caller_id": "Cronjob",
      "correlation_id": "f2f5b759df6e60f76add1a3c29c41c48",
      "meta.root_caller_id": "Cronjob",
      "meta.feature_category": "vulnerability_management",
      "worker_data_consistency": "always",
      "idempotency_key": "resque:gitlab:duplicate:default:e28051869e793708bf6d7f453756748b3a630753ba8c4a71b23a17d4b48c80bb",
      "size_limiter": "validated",
      "enqueued_at": "2023-07-11T22:22:05.439Z",
      "job_size_bytes": 2,
      "pid": 482581,
      "message": "Vulnerabilities::OrphanedRemediationsCleanupWorker JID-f48afdaa29493f6851f60bfd: start",
      "job_status": "start",
      "scheduling_latency_s": 0.012039
    }
  6. see if all the remediations you created are deleted

        Vulnerabilities::Remediation.count
    
        # => 4

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #378269 (closed)

Edited by Michael Becker

Merge request reports

Loading