Add scheduled sync background worker for package metadata
Problem to solve
Ingestion of package metadata from the external license database must happen at intermittent intervals and in the background.
The PackageMetadata::SyncService is in charge of connecting to the data source, finding the last synced position, and fetching the data. The worker's job is to invoke this service.
Proposal
Add a background worker which will be triggered intermittently by the instance. The worker will invoke PackageMetadata::SyncService.
Implementation Plan
-
add PackageMetadata::SyncWorker
under ee/app/workers -
invoke worker via cron job -
include CronjobQueue
(example cronjob workers: https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/active_user_count_threshold_worker.rb or https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/adjourned_projects_deletion_cron_worker.rb) -
update settings to add the cronjob https://gitlab.com/gitlab-org/gitlab/-/blob/master/config/initializers/1_settings.rb#L666
-
Sync interval
Since PackageMetadata::SyncService can skip already consumed data and data import will upsert duplicates, the sync interval can be frequent enough to catch updates. As a first iteration, an hourly interval can be used.
Edited by Oscar Tovar