Upgrade fleeting/taskscaler: fixes reservation/unavailability instance churn
What does this MR do?
Similar to !4818 (merged), a few more cases where the calculation for required instances was incorrect when handling reservations and instances being deleted.
Why was this MR needed?
When idle_scale
was > 0, it would occasionally be hard to completely remove instances that were no longer required, with
an instance often being created in place of an instance that was being removed.
What's the best way to test this MR?
taskscaler has updated tests for this: gitlab-org/fleeting/taskscaler!52 (merged)
Manual QA:
concurrent = 4
[[runners]]
url = "https://gitlab.com"
executor = "docker-autoscaler"
[runners.docker]
image = "busybox:latest"
[runners.autoscaler]
capacity_per_instance = 1
max_use_count = 100
max_instances = 5
plugin = "aws"
[runners.autoscaler.plugin_config]
name = "linux-test"
region = "us-west-2"
[runners.autoscaler.connector_config]
username = "ec2-user"
key_path = "key.pem"
use_static_credentials = true
keepalive = "0s"
timeout = "10m0s"
use_external_addr = true
[[runners.autoscaler.policy]]
idle_count = 1
idle_time = "5m0s"
scale_factor = 0.0
scale_factor_limit = 0
- Wait for idle instance to come up
- Create a job
- Wait for new idle instance to come up
- Observe second idle instance attempt removal after job finishes
To trigger the old behaviour before this fix, you might need to keep trying until there's a situation where UnavailableCapacity
is positive and prevents scaling down/brings up a new instance.