Skip to content

Review-apps - Count k8s resources and measures GCP quotas

David Dieulivol requested to merge 37-count_k8s_resources_and_quotas into master

What does this MR do and why?

Closes https://gitlab.com/gitlab-org/quality/engineering-productivity-infrastructure/-/issues/37

  • Adds a job to count k8s resources, and alert if they are above a certain threshold.
    • The job fails if we have more than 3000 kubernetes services.
    • The job fails if we have more than 200 review apps deployed.
    • The job fails if we have a difference greater than 30 between the number of review-apps deployed, and the number of k8s namespaces present.
  • Adds a job to measure GCP quotas. The job fails if any of them is above 80%.

The alerting in Slack will be handled in https://gitlab.com/gitlab-org/quality/engineering-productivity-infrastructure/-/issues/38

What does it look like?

K8s resources count:

Job:

https://gitlab.com/gitlab-org/gitlab/-/jobs/3361121443

Local output:

$ ./scripts/review_apps/k8s-resources-count-checks.sh
[ERROR] Difference between namespaces and deployed review-apps is above 30 (385 namespaces and 92 review-apps)

Quotas (when changing the threshold to 5%)

Job (threshold at 80%):

https://gitlab.com/gitlab-org/gitlab/-/jobs/3361101996

Local output:

$ ruby scripts/review_apps/gcp-quotas-checks.rb
Checking regional quotas:
Checking quota CPUS...❌ CPUS is above the 5.0% threshold! (current value: 0.08)
Checking quota DISKS_TOTAL_GB...❌ DISKS_TOTAL_GB is above the 5.0% threshold! (current value: 0.053076171875)
Checking quota STATIC_ADDRESSES...✅
Checking quota IN_USE_ADDRESSES...✅
Checking quota SSD_TOTAL_GB...✅
Checking quota INSTANCE_TEMPLATES...✅
Checking quota LOCAL_SSD_TOTAL_GB...✅
Checking quota INSTANCE_GROUPS...✅
Checking quota INSTANCE_GROUP_MANAGERS...✅
Checking quota INSTANCES...✅
Checking quota AUTOSCALERS...✅
Checking quota REGIONAL_AUTOSCALERS...✅
Checking quota REGIONAL_INSTANCE_GROUP_MANAGERS...✅
Checking quota TARGET_TCP_PROXIES...✅
Checking quota PREEMPTIBLE_CPUS...✅
Checking quota NVIDIA_K80_GPUS...✅
Checking quota COMMITTED_CPUS...✅
Checking quota COMMITTED_LOCAL_SSD_TOTAL_GB...✅
Checking quota COMMITMENTS...✅
Checking quota NETWORK_ENDPOINT_GROUPS...✅
Checking quota INTERNAL_ADDRESSES...✅
Checking quota NVIDIA_P100_GPUS...✅
Checking quota PREEMPTIBLE_LOCAL_SSD_GB...✅
Checking quota PREEMPTIBLE_NVIDIA_K80_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_P100_GPUS...✅
Checking quota NVIDIA_P100_VWS_GPUS...✅
Checking quota NVIDIA_V100_GPUS...✅
Checking quota NVIDIA_P4_GPUS...✅
Checking quota NVIDIA_P4_VWS_GPUS...✅
Checking quota NODE_GROUPS...✅
Checking quota NODE_TEMPLATES...✅
Checking quota PREEMPTIBLE_NVIDIA_V100_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_P4_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_P100_VWS_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_P4_VWS_GPUS...✅
Checking quota INTERCONNECT_ATTACHMENTS_PER_REGION...✅
Checking quota INTERCONNECT_ATTACHMENTS_TOTAL_MBPS...✅
Checking quota RESOURCE_POLICIES...✅
Checking quota IN_USE_SNAPSHOT_SCHEDULES...✅
Checking quota NVIDIA_T4_GPUS...✅
Checking quota NVIDIA_T4_VWS_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_T4_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_T4_VWS_GPUS...✅
Checking quota IN_USE_BACKUP_SCHEDULES...✅
Checking quota PUBLIC_DELEGATED_PREFIXES...✅
Checking quota COMMITTED_NVIDIA_K80_GPUS...✅
Checking quota COMMITTED_NVIDIA_P100_GPUS...✅
Checking quota COMMITTED_NVIDIA_P4_GPUS...✅
Checking quota COMMITTED_NVIDIA_V100_GPUS...✅
Checking quota COMMITTED_NVIDIA_T4_GPUS...✅
Checking quota C2_CPUS...✅
Checking quota N2_CPUS...✅
Checking quota COMMITTED_N2_CPUS...✅
Checking quota COMMITTED_C2_CPUS...✅
Checking quota RESERVATIONS...✅
Checking quota COMMITTED_LICENSES...✅
Checking quota N2D_CPUS...✅
Checking quota COMMITTED_N2D_CPUS...✅
Checking quota SERVICE_ATTACHMENTS...✅
Checking quota STATIC_BYOIP_ADDRESSES...✅
Checking quota AFFINITY_GROUPS...✅
Checking quota NVIDIA_A100_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_A100_GPUS...✅
Checking quota COMMITTED_NVIDIA_A100_GPUS...✅
Checking quota M1_CPUS...✅
Checking quota M2_CPUS...✅
Checking quota A2_CPUS...✅
Checking quota COMMITTED_A2_CPUS...✅
Checking quota COMMITTED_MEMORY_OPTIMIZED_CPUS...✅
Checking quota NETWORK_FIREWALL_POLICIES...✅
Checking quota PSC_INTERNAL_LB_FORWARDING_RULES...✅
Checking quota EXTERNAL_NETWORK_LB_FORWARDING_RULES...❌ EXTERNAL_NETWORK_LB_FORWARDING_RULES is above the 5.0% threshold! (current value: 0.174)
Checking quota EXTERNAL_PROTOCOL_FORWARDING_RULES...✅
Checking quota PD_EXTREME_TOTAL_PROVISIONED_IOPS...✅
Checking quota E2_CPUS...✅
Checking quota COMMITTED_E2_CPUS...✅
Checking quota EXTERNAL_MANAGED_FORWARDING_RULES...✅
Checking quota C2D_CPUS...✅
Checking quota COMMITTED_C2D_CPUS...✅
Checking quota N2A_CPUS...✅
Checking quota SECURITY_POLICIES_PER_REGION...✅
Checking quota SECURITY_POLICY_RULES_PER_REGION...✅
Checking quota T2D_CPUS...✅
Checking quota COMMITTED_T2D_CPUS...✅
Checking quota T2A_CPUS...✅
Checking quota M3_CPUS...✅
Checking quota COMMITTED_M3_CPUS...✅
Checking quota NVIDIA_A100_80GB_GPUS...✅
Checking quota PREEMPTIBLE_NVIDIA_A100_80GB_GPUS...✅
Checking quota COMMITTED_NVIDIA_A100_80GB_GPUS...✅

Checking project-wide quotas:
Checking quota SNAPSHOTS...✅
Checking quota NETWORKS...✅
Checking quota FIREWALLS...❌ FIREWALLS is above the 5.0% threshold! (current value: 0.206)
Checking quota IMAGES...✅
Checking quota STATIC_ADDRESSES...✅
Checking quota ROUTES...✅
Checking quota FORWARDING_RULES...✅
Checking quota TARGET_POOLS...❌ TARGET_POOLS is above the 5.0% threshold! (current value: 0.0976)
Checking quota HEALTH_CHECKS...❌ HEALTH_CHECKS is above the 5.0% threshold! (current value: 0.1016)
Checking quota IN_USE_ADDRESSES...✅
Checking quota TARGET_INSTANCES...✅
Checking quota TARGET_HTTP_PROXIES...✅
Checking quota URL_MAPS...✅
Checking quota BACKEND_SERVICES...✅
Checking quota INSTANCE_TEMPLATES...✅
Checking quota TARGET_VPN_GATEWAYS...✅
Checking quota VPN_TUNNELS...✅
Checking quota BACKEND_BUCKETS...✅
Checking quota ROUTERS...✅
Checking quota TARGET_SSL_PROXIES...✅
Checking quota TARGET_HTTPS_PROXIES...✅
Checking quota SSL_CERTIFICATES...✅
Checking quota SUBNETWORKS...✅
Checking quota TARGET_TCP_PROXIES...✅
Checking quota SECURITY_POLICIES...✅
Checking quota SECURITY_POLICY_RULES...✅
Checking quota XPN_SERVICE_PROJECTS...✅
Checking quota PACKET_MIRRORINGS...✅
Checking quota NETWORK_ENDPOINT_GROUPS...✅
Checking quota INTERCONNECTS...✅
Checking quota GLOBAL_INTERNAL_ADDRESSES...✅
Checking quota VPN_GATEWAYS...✅
Checking quota MACHINE_IMAGES...✅
Checking quota SECURITY_POLICY_CEVAL_RULES...✅
Checking quota EXTERNAL_VPN_GATEWAYS...✅
Checking quota PUBLIC_ADVERTISED_PREFIXES...✅
Checking quota PUBLIC_DELEGATED_PREFIXES...✅
Checking quota STATIC_BYOIP_ADDRESSES...✅
Checking quota NETWORK_FIREWALL_POLICIES...✅
Checking quota INTERNAL_TRAFFIC_DIRECTOR_FORWARDING_RULES...✅
Checking quota GLOBAL_EXTERNAL_MANAGED_FORWARDING_RULES...✅

How to set up and validate locally

  1. ./scripts/review_apps/k8s-resources-count-checks.sh
  2. ruby scripts/review_apps/gcp-quotas-checks.rb

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Dieulivol

Merge request reports

Loading