Improve observability of the Enqueuer worker
☎ Context
We're currently implement a data migration on the Container Registry. This migration is going to be driven by the rails backend.
At the core of the rails part lies the Enqueuer worker. Its responsibility is: find the next eligible image repository to migrate and call the container registry to start/retry the migration.
The migration orchestration (rails and container registry) is gated behind a feature flag and we're currently testing the whole process on staging.
Preliminary tests revealed that the Enqueuer job doesn't log enough information. We need more information on the done
message:
- when a guard is triggered
- when an error occurs
- when the picked container repository fails additional checks.
This is issue #356042 (closed).
🤔 What does this MR do and why?
- Add more logs in the Enqueuer job
- Push down error handling so that we have a more precise message
- Update the related specs
No changelog added because as stated in the Context
above, this worker is gated behind multiple feature flag and for now, it is only enabled on demand on staging when we test the migration.
🖼 Screenshots or screen recordings
n / a
🎬 How to set up and validate locally
Follow !78613 (merged) and you should see the additional logs in the background jobs logs.
Examples:
- With feature flag
container_registry_migration_phase2_enabled
disabled, we get:"extra.container_registry_migration_enqueuer_worker.migration_enabled": false
- With no capacity, we get:
"extra.container_registry_migration_enqueuer_worker.max_capacity_setting": 0, "extra.container_registry_migration_enqueuer_worker.below_capacity": false,
- Handling the next repository, we get:
"extra.container_registry_migration_enqueuer_worker.import_type": "next", "extra.container_registry_migration_enqueuer_worker.container_repository_id": 18, "extra.container_registry_migration_enqueuer_worker.container_repository_path": "gitlab-org/gitlab-test/test_image_11",
- Execution that is triggered too soon, we get:
"extra.container_registry_migration_enqueuer_worker.waiting_time_passed": false, "extra.container_registry_migration_enqueuer_worker.current_waiting_time_setting": 3600,
💈 MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.