Tag some workers that we don't run on GitLab.com
https://gitlab.com/gitlab-com/runbooks/-/blob/master/rules-jsonnet/temp-ignored-gprd-queue-list.libsonnet has an alert for a list of queues that shouldn't run on GitLab.com, because they aren't used and by not listening to them we can save the Sidekiq Redis from doing a bit of work. https://ops.gitlab.net/gitlab-cookbooks/chef-repo/-/merge_requests/2948 was where we made the initial configuration change for this.
However, in gitlab-com/gl-infra/k8s-workloads/gitlab-com!394 (merged) we started listening to these on k8s, which undid that work. As the queues aren't actually used, the alert didn't fire.
This adds a tag to those workers so we can do tags=exclude_from_kubernetes,exclude_from_gitlab_com
in our k8s catchall shard to exclude these.
This isn't exactly the same as in the alert. The differences are from some Geo workers that no longer exist, and some that we've added:
--- expected 2021-05-07 12:34:12.000000000 +0100
+++ actual 2021-05-07 12:44:39.000000000 +0100
@@ -6,7 +6,6 @@
cronjob:geo_container_repository_sync_dispatch
cronjob:geo_file_download_dispatch
cronjob:geo_metrics_update
-cronjob:geo_migrated_local_files_clean_up
cronjob:geo_prune_event_log
cronjob:geo_repository_sync
cronjob:geo_repository_verification_primary_batch
@@ -16,11 +15,15 @@
cronjob:geo_scheduler_primary_per_shard_scheduler
cronjob:geo_scheduler_secondary_per_shard_scheduler
cronjob:geo_secondary_registry_consistency
+cronjob:geo_secondary_usage_data_cron
+cronjob:geo_sync_timeout_cron
+cronjob:geo_verification_cron
geo:geo_batch_project_registry
geo:geo_batch_project_registry_scheduler
geo:geo_container_repository_sync
geo:geo_design_repository_shard_sync
geo:geo_design_repository_sync
+geo:geo_destroy
geo:geo_event
geo:geo_file_download
geo:geo_file_registry_removal
@@ -36,10 +39,13 @@
geo:geo_repository_verification_primary_shard
geo:geo_repository_verification_primary_single
geo:geo_repository_verification_secondary_single
+geo:geo_reverification_batch
geo:geo_scheduler_primary_scheduler
geo:geo_scheduler_scheduler
geo:geo_scheduler_secondary_scheduler
-geo:geo_secondary_repository_backfill
+geo:geo_verification
+geo:geo_verification_batch
+geo:geo_verification_timeout
hashed_storage:hashed_storage_migrator
hashed_storage:hashed_storage_project_migrate
hashed_storage:hashed_storage_project_rollback
The only Geo worker that runs on GitLab.com is not affected by this change: https://thanos-query.ops.gitlab.net/graph?g0.range_input=1w&g0.max_source_resolution=0s&g0.expr=sum%20by%20(queue)%20(gitlab_background_jobs%3Aqueue%3Aops%3Arate_5m%7Benvironment%3D%22gprd%22%2C%20feature_category%3D%22geo_replication%22%7D)&g0.tab=0