Review-apps - Count k8s resources and measures GCP quotas
What does this MR do and why?
Closes https://gitlab.com/gitlab-org/quality/engineering-productivity-infrastructure/-/issues/37
- Adds a job to count k8s resources, and alert if they are above a certain threshold.
- The job fails if we have more than 3000 kubernetes services.
- The job fails if we have more than 200 review apps deployed.
- The job fails if we have a difference greater than 30 between the number of review-apps deployed, and the number of k8s namespaces present.
- Adds a job to measure GCP quotas. The job fails if any of them is above 80%.
The alerting in Slack will be handled in https://gitlab.com/gitlab-org/quality/engineering-productivity-infrastructure/-/issues/38
What does it look like?
K8s resources count:
Job:
https://gitlab.com/gitlab-org/gitlab/-/jobs/3361121443
Local output:
$ ./scripts/review_apps/k8s-resources-count-checks.sh
❌ [ERROR] Difference between namespaces and deployed review-apps is above 30 (385 namespaces and 92 review-apps)
5%
)
Quotas (when changing the threshold to Job (threshold at 80%):
https://gitlab.com/gitlab-org/gitlab/-/jobs/3361101996
Local output:
$ ruby scripts/review_apps/gcp-quotas-checks.rb
Checking regional quotas:
Checking quota CPUS...❌ CPUS is above the 5.0% threshold! (current value: 0.08)
Checking quota DISKS_TOTAL_GB...❌ DISKS_TOTAL_GB is above the 5.0% threshold! (current value: 0.053076171875)
Checking quota STATIC_ADDRESSES...✅
Checking quota IN_USE_ADDRESSES...✅
Checking quota SSD_TOTAL_GB...✅
Checking quota INSTANCE_TEMPLATES...✅
Checking quota LOCAL_SSD_TOTAL_GB...✅
Checking quota INSTANCE_GROUPS...✅
Checking quota INSTANCE_GROUP_MANAGERS...✅
Checking quota INSTANCES...✅
Checking quota AUTOSCALERS...✅
Checking quota REGIONAL_AUTOSCALERS...✅
Checking quota REGIONAL_INSTANCE_GROUP_MANAGERS...✅
Checking quota TARGET_TCP_PROXIES...✅
Checking quota PREEMPTIBLE_CPUS...✅
Checking quota NVIDIA_K80_GPUS...✅
Checking quota COMMITTED_CPUS...✅
Checking quota COMMITTED_LOCAL_SSD_TOTAL_GB...✅
Checking quota COMMITMENTS...✅
Checking quota NETWORK_ENDPOINT_GROUPS...✅
Checking quota INTERNAL_ADDRESSES...✅
Checking quota NVIDIA_P100_GPUS...✅
Checking quota PREEMPTIBLE_LOCAL_SSD_GB...✅
Checking quota PREEMPTIBLE_NVIDIA_K80_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_P100_GPUS...✅
Checking quota NVIDIA_P100_VWS_GPUS...✅
Checking quota NVIDIA_V100_GPUS...✅
Checking quota NVIDIA_P4_GPUS...✅
Checking quota NVIDIA_P4_VWS_GPUS...✅
Checking quota NODE_GROUPS...✅
Checking quota NODE_TEMPLATES...✅
Checking quota PREEMPTIBLE_NVIDIA_V100_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_P4_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_P100_VWS_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_P4_VWS_GPUS...✅
Checking quota INTERCONNECT_ATTACHMENTS_PER_REGION...✅
Checking quota INTERCONNECT_ATTACHMENTS_TOTAL_MBPS...✅
Checking quota RESOURCE_POLICIES...✅
Checking quota IN_USE_SNAPSHOT_SCHEDULES...✅
Checking quota NVIDIA_T4_GPUS...✅
Checking quota NVIDIA_T4_VWS_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_T4_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_T4_VWS_GPUS...✅
Checking quota IN_USE_BACKUP_SCHEDULES...✅
Checking quota PUBLIC_DELEGATED_PREFIXES...✅
Checking quota COMMITTED_NVIDIA_K80_GPUS...✅
Checking quota COMMITTED_NVIDIA_P100_GPUS...✅
Checking quota COMMITTED_NVIDIA_P4_GPUS...✅
Checking quota COMMITTED_NVIDIA_V100_GPUS...✅
Checking quota COMMITTED_NVIDIA_T4_GPUS...✅
Checking quota C2_CPUS...✅
Checking quota N2_CPUS...✅
Checking quota COMMITTED_N2_CPUS...✅
Checking quota COMMITTED_C2_CPUS...✅
Checking quota RESERVATIONS...✅
Checking quota COMMITTED_LICENSES...✅
Checking quota N2D_CPUS...✅
Checking quota COMMITTED_N2D_CPUS...✅
Checking quota SERVICE_ATTACHMENTS...✅
Checking quota STATIC_BYOIP_ADDRESSES...✅
Checking quota AFFINITY_GROUPS...✅
Checking quota NVIDIA_A100_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_A100_GPUS...✅
Checking quota COMMITTED_NVIDIA_A100_GPUS...✅
Checking quota M1_CPUS...✅
Checking quota M2_CPUS...✅
Checking quota A2_CPUS...✅
Checking quota COMMITTED_A2_CPUS...✅
Checking quota COMMITTED_MEMORY_OPTIMIZED_CPUS...✅
Checking quota NETWORK_FIREWALL_POLICIES...✅
Checking quota PSC_INTERNAL_LB_FORWARDING_RULES...✅
Checking quota EXTERNAL_NETWORK_LB_FORWARDING_RULES...❌ EXTERNAL_NETWORK_LB_FORWARDING_RULES is above the 5.0% threshold! (current value: 0.174)
Checking quota EXTERNAL_PROTOCOL_FORWARDING_RULES...✅
Checking quota PD_EXTREME_TOTAL_PROVISIONED_IOPS...✅
Checking quota E2_CPUS...✅
Checking quota COMMITTED_E2_CPUS...✅
Checking quota EXTERNAL_MANAGED_FORWARDING_RULES...✅
Checking quota C2D_CPUS...✅
Checking quota COMMITTED_C2D_CPUS...✅
Checking quota N2A_CPUS...✅
Checking quota SECURITY_POLICIES_PER_REGION...✅
Checking quota SECURITY_POLICY_RULES_PER_REGION...✅
Checking quota T2D_CPUS...✅
Checking quota COMMITTED_T2D_CPUS...✅
Checking quota T2A_CPUS...✅
Checking quota M3_CPUS...✅
Checking quota COMMITTED_M3_CPUS...✅
Checking quota NVIDIA_A100_80GB_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_A100_80GB_GPUS...✅
Checking quota COMMITTED_NVIDIA_A100_80GB_GPUS...✅
Checking project-wide quotas:
Checking quota SNAPSHOTS...✅
Checking quota NETWORKS...✅
Checking quota FIREWALLS...❌ FIREWALLS is above the 5.0% threshold! (current value: 0.206)
Checking quota IMAGES...✅
Checking quota STATIC_ADDRESSES...✅
Checking quota ROUTES...✅
Checking quota FORWARDING_RULES...✅
Checking quota TARGET_POOLS...❌ TARGET_POOLS is above the 5.0% threshold! (current value: 0.0976)
Checking quota HEALTH_CHECKS...❌ HEALTH_CHECKS is above the 5.0% threshold! (current value: 0.1016)
Checking quota IN_USE_ADDRESSES...✅
Checking quota TARGET_INSTANCES...✅
Checking quota TARGET_HTTP_PROXIES...✅
Checking quota URL_MAPS...✅
Checking quota BACKEND_SERVICES...✅
Checking quota INSTANCE_TEMPLATES...✅
Checking quota TARGET_VPN_GATEWAYS...✅
Checking quota VPN_TUNNELS...✅
Checking quota BACKEND_BUCKETS...✅
Checking quota ROUTERS...✅
Checking quota TARGET_SSL_PROXIES...✅
Checking quota TARGET_HTTPS_PROXIES...✅
Checking quota SSL_CERTIFICATES...✅
Checking quota SUBNETWORKS...✅
Checking quota TARGET_TCP_PROXIES...✅
Checking quota SECURITY_POLICIES...✅
Checking quota SECURITY_POLICY_RULES...✅
Checking quota XPN_SERVICE_PROJECTS...✅
Checking quota PACKET_MIRRORINGS...✅
Checking quota NETWORK_ENDPOINT_GROUPS...✅
Checking quota INTERCONNECTS...✅
Checking quota GLOBAL_INTERNAL_ADDRESSES...✅
Checking quota VPN_GATEWAYS...✅
Checking quota MACHINE_IMAGES...✅
Checking quota SECURITY_POLICY_CEVAL_RULES...✅
Checking quota EXTERNAL_VPN_GATEWAYS...✅
Checking quota PUBLIC_ADVERTISED_PREFIXES...✅
Checking quota PUBLIC_DELEGATED_PREFIXES...✅
Checking quota STATIC_BYOIP_ADDRESSES...✅
Checking quota NETWORK_FIREWALL_POLICIES...✅
Checking quota INTERNAL_TRAFFIC_DIRECTOR_FORWARDING_RULES...✅
Checking quota GLOBAL_EXTERNAL_MANAGED_FORWARDING_RULES...✅
How to set up and validate locally
./scripts/review_apps/k8s-resources-count-checks.sh
ruby scripts/review_apps/gcp-quotas-checks.rb
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by David Dieulivol