Implement rate-limited scale up
Implements both:
- Rate limiting the number of "scale up" requests we can provide to the cloud provider. Note that this is separate from API requests. This is the request for how much compute we want the cloud provider to provide us (which can be done in one API request).
- Exponential backoff of scale up requests if the preparation of instances is failing.
The default is 100 instances per second, with a burst limit that defaults to max_instances
(which if is zero, defaults to being equal to the rate limit).
Exponential backoff will delay instances from being created anywhere from 10 second to 20 minutes. Bursting is reset after a delay (so that you can't immediately burst after the backoff).
The minimum delay starts fairly high because we want this to be noticed: creating instances that fail often can soon add up cost wise.
Because when we scale we often issue a batch request for instances, it's possible that some instances succeed ready-up and some fail.
When this occurs:
- Multiple
Failure()
s count towards the amount of time we add to the exponential backoff. - A call to
Success()
only resets the failure count after the exponential backoff delay. This is to prevent instances that succeeded within a batch from invalidating any failure.
Closes #26 (closed)
Edited by Arran Walker