maintenance: Introduce rate limiting for `OptimizeRepo`
With the current implementation of OptimizeRepo
, we're trying to do as
much work as possible in the timeframe dictated by the maintenance
schedule. This has proven to be problematic even in times of reduced
load on deployments because we were essentially DoS'ing ourselves with
so many requests that it caused alerts to trigger. On staging systems,
we see more than 200 calls per second for the OptimizeRepository
RPC.
Even if those jobs don't need to do any heavy repacking, it does cause
heavy read-load on Gitaly nodes just to determine that nothing needs to
be done.
As a first iteration towards betterment, this commit introduces rate limiting to the maintenance job. Instead of going as quickly as we can, we limit requests per second to 1.
Of course this means that we're now able to optimize less repositories than we previously did. But this is still much better than driving a DoS against ourselves and waking up SREs on weekends. Depending on the typical load we'll see with this rate limiting, we may be able to enable the maintenance task 24/7.