Skip to content

Change ci_runner_versions_reconciliation_worker job to run daily

Pedro Pombeiro requested to merge pedropombeiro/358406/change-to-@hourly into master

What does this MR do and why?

Describe in detail what your merge request does and why.

This MR changes the ci_runner_versions_reconciliation_worker to run daily at a random time of day. This is needed because it is causing the cached data in Gitlab::Ci::RunnerReleases#releases to be refreshed hourly at 20 minutes past the hour. Since we can have hundreds of thousands of self-managed GitLab instances, this can represent a big load on the GitLab Releases endpoint.

Part of Recalculate ci_runner_versions.status when requ... (#368702 - closed)

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

image

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

In review app

  1. Go to https://gitlab-review-pedropombe-sa8d70.gitlab-review.app/admin/background_jobs
  2. Enqueue ci_runner_versions_reconciliation_worker job
  3. Go to Scheduled tab
  4. There should be a cronjob:ci_runners_reconcile_existing_runner_versions_cron job scheduled to run somewhere within the next 24 hours with argument false

With GDK

In order to avoid waiting a 24 hour period, we can make the schedules run more often:

Patch
diff --git a/app/workers/ci/runners/reconcile_existing_runner_versions_cron_worker.rb b/app/workers/ci/runners/reconcile_existing_runner_versions_cron_worker.rb
index 283284cb197..c66fc467c7b 100644
--- a/app/workers/ci/runners/reconcile_existing_runner_versions_cron_worker.rb
+++ b/app/workers/ci/runners/reconcile_existing_runner_versions_cron_worker.rb
@@ -19,7 +19,7 @@ def perform(cronjob_scheduled = true)
         if cronjob_scheduled
           # Introduce some randomness across the day so that instances don't all hit the GitLab Releases API
           # around the same time of day
-          period = rand(0..ActiveSupport::Duration::SECONDS_PER_DAY)
+          period = rand(0..60)
           self.class.perform_in(period, false)
 
           Sidekiq.logger.info(
diff --git a/config/initializers/1_settings.rb b/config/initializers/1_settings.rb
index 2306fccd7f6..173651e8735 100644
--- a/config/initializers/1_settings.rb
+++ b/config/initializers/1_settings.rb
@@ -631,7 +631,7 @@
 Settings.cron_jobs['loose_foreign_keys_cleanup_worker']['cron'] ||= '*/1 * * * *'
 Settings.cron_jobs['loose_foreign_keys_cleanup_worker']['job_class'] = 'LooseForeignKeys::CleanupWorker'
 Settings.cron_jobs['ci_runner_versions_reconciliation_worker'] ||= Settingslogic.new({})
-Settings.cron_jobs['ci_runner_versions_reconciliation_worker']['cron'] ||= '@daily'
+Settings.cron_jobs['ci_runner_versions_reconciliation_worker']['cron'] ||= '*/5 * * * *'
 Settings.cron_jobs['ci_runner_versions_reconciliation_worker']['job_class'] = 'Ci::Runners::ReconcileExistingRunnerVersionsCronWorker'
 
 Gitlab.ee do

Looking at log/sidekiq.log:

939724:{"severity":"INFO","time":"2022-07-26T13:02:28.279Z","retry":0,"queue":"default","backtrace":true,"version":0,"queue_namespace":"cronjob","args":[],"class":"Ci::Runners::ReconcileExistingRunnerVersionsCronWorker","jid":"dd10d462c0a92d28ea424132","created_at":"2022-07-26T13:02:28.279Z","meta.caller_id":"Cronjob","correlation_id":"b0673a2a3a9bc1ad9941926208b0e273","meta.root_caller_id":"Cronjob","meta.feature_category":"runner_fleet","worker_data_consistency":"sticky","wal_locations":{},"wal_location_source":"primary","idempotency_key":"resque:gitlab:duplicate:default:62f0a5c94ca1c535f52ef159df6e37feda642bf46c4bd7d257bc0d6ab03084c3","size_limiter":"validated","enqueued_at":"2022-07-26T13:02:28.279Z","job_size_bytes":2,"pid":70419,"message":"Ci::Runners::ReconcileExistingRunnerVersionsCronWorker JID-dd10d462c0a92d28ea424132: start","job_status":"start","scheduling_latency_s":0.000267}
939742:{"severity":"INFO","time":"2022-07-26T13:02:29.963Z","class":"Ci::Runners::ReconcileExistingRunnerVersionsCronWorker","message":"rescheduled job for 2022-07-26 13:03:18 UTC","retry":0}
939743:{"severity":"INFO","time":"2022-07-26T13:02:29.964Z","retry":0,"queue":"default","backtrace":true,"version":0,"queue_namespace":"cronjob","args":[],"class":"Ci::Runners::ReconcileExistingRunnerVersionsCronWorker","jid":"dd10d462c0a92d28ea424132","created_at":"2022-07-26T13:02:28.279Z","meta.caller_id":"Cronjob","correlation_id":"b0673a2a3a9bc1ad9941926208b0e273","meta.root_caller_id":"Cronjob","meta.feature_category":"runner_fleet","worker_data_consistency":"sticky","wal_locations":{},"wal_location_source":"primary","idempotency_key":"resque:gitlab:duplicate:default:62f0a5c94ca1c535f52ef159df6e37feda642bf46c4bd7d257bc0d6ab03084c3","size_limiter":"validated","enqueued_at":"2022-07-26T13:02:28.279Z","job_size_bytes":2,"pid":70419,"message":"Ci::Runners::ReconcileExistingRunnerVersionsCronWorker JID-dd10d462c0a92d28ea424132: done: 1.684508 sec","job_status":"done","scheduling_latency_s":0.000267,"redis_calls":1,"redis_duration_s":0.000125,"redis_read_bytes":11,"redis_write_bytes":955,"redis_queues_calls":1,"redis_queues_duration_s":0.000125,"redis_queues_read_bytes":11,"redis_queues_write_bytes":955,"db_count":0,"db_write_count":0,"db_cached_count":0,"db_replica_count":0,"db_primary_count":0,"db_main_count":0,"db_ci_count":0,"db_main_replica_count":0,"db_ci_replica_count":0,"db_replica_cached_count":0,"db_primary_cached_count":0,"db_main_cached_count":0,"db_ci_cached_count":0,"db_main_replica_cached_count":0,"db_ci_replica_cached_count":0,"db_replica_wal_count":0,"db_primary_wal_count":0,"db_main_wal_count":0,"db_ci_wal_count":0,"db_main_replica_wal_count":0,"db_ci_replica_wal_count":0,"db_replica_wal_cached_count":0,"db_primary_wal_cached_count":0,"db_main_wal_cached_count":0,"db_ci_wal_cached_count":0,"db_main_replica_wal_cached_count":0,"db_ci_replica_wal_cached_count":0,"db_replica_duration_s":0.0,"db_primary_duration_s":0.0,"db_main_duration_s":0.0,"db_ci_duration_s":0.0,"db_main_replica_duration_s":0.0,"db_ci_replica_duration_s":0.0,"cpu_s":0.002382,"worker_id":"sidekiq_0","rate_limiting_gates":[],"duration_s":1.684508,"completed_at":"2022-07-26T13:02:29.964Z","load_balancing_strategy":"primary_no_wal","db_duration_s":0.0}
939791:{"severity":"INFO","time":"2022-07-26T13:03:26.256Z","retry":0,"queue":"default","backtrace":true,"version":0,"queue_namespace":"cronjob","class":"Ci::Runners::ReconcileExistingRunnerVersionsCronWorker","args":["[FILTERED]"],"jid":"96067894edbedade7c575c81","created_at":"2022-07-26T13:02:29.962Z","meta.caller_id":"Cronjob","correlation_id":"00e22e2bb8650fe8646a0f507304cea8","meta.root_caller_id":"Cronjob","meta.feature_category":"runner_fleet","meta.client_id":"ip/","worker_data_consistency":"sticky","wal_locations":{},"wal_location_source":"primary","size_limiter":"validated","scheduled_at":"2022-07-26T13:03:18.961Z","idempotency_key":"resque:gitlab:duplicate:default:f18c01846766ac372063e2051ad0a7345de1e07a3b58f1189f319053e7c09936","enqueued_at":"2022-07-26T13:03:26.255Z","job_size_bytes":7,"pid":70419,"message":"Ci::Runners::ReconcileExistingRunnerVersionsCronWorker JID-96067894edbedade7c575c81: start","job_status":"start","scheduling_latency_s":0.00052,"enqueue_latency_s":7.293876}
  1. In 939724, the job is started by cron
  2. In 939742, the started job decides to reschedule the job for later in the day (2022-07-26 13:03:18 UTC). The worker then immediately quits in 939743
  3. In 939791, the job finally calls the service, as the cronjob_scheduled argument is now present.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Pedro Pombeiro

Merge request reports

Loading