Prevent new autoscaler thrashing instances
What does this MR do?
The change first uses taskscaler.Reserve()
to reserve capacity, and once we have
a job, calls taskscaler.Acquire()
in the wrapped executor returned by the
executor provider. This ensures that when idle scaling is enabled, we only
accept jobs once we've confirmed there's capacity and have reserved it.
Why was this MR needed?
Previously, we were acquiring a capacity and then releasing it if there was no job. But this thrashed VMs if taskscaler had been configured to remove capacity after it has been used.
What's the best way to test this MR?
Test with various scaling rules, for example:
[runners.autoscaler]
capacity_per_instance = 2
max_use_count = 2
max_instances = 5
plugin = "fleeting-plugin-aws"
[[runners.autoscaler.policy]]
idle_count = 4
idle_time = "20m"
[runners.autoscaler.connector_config]
username = "ubuntu"
timeout = "10m"
[runners.autoscaler.plugin_config]
name = "<autoscaler name>"
region = "us-west-2"
What are the relevant issue numbers?
Closes #29431 (closed)