Skip to content

Implement rate-limited scale up

Arran Walker requested to merge ajwalker/ratelimit-scale-up into main

Implements both:

  • Rate limiting the number of "scale up" requests we can provide to the cloud provider. Note that this is separate from API requests. This is the request for how much compute we want the cloud provider to provide us (which can be done in one API request).
  • Exponential backoff of scale up requests if the preparation of instances is failing.

The default is 100 instances per second, with a burst limit that defaults to max_instances (which if is zero, defaults to being equal to the rate limit).

Exponential backoff will delay instances from being created anywhere from 10 second to 20 minutes. Bursting is reset after a delay (so that you can't immediately burst after the backoff).

The minimum delay starts fairly high because we want this to be noticed: creating instances that fail often can soon add up cost wise.

Because when we scale we often issue a batch request for instances, it's possible that some instances succeed ready-up and some fail.

When this occurs:

  • Multiple Failure()s count towards the amount of time we add to the exponential backoff.
  • A call to Success() only resets the failure count after the exponential backoff delay. This is to prevent instances that succeeded within a batch from invalidating any failure.

Closes #26 (closed)

Edited by Arran Walker

Merge request reports

Loading