Create a simple scheduling worker that would assign added namespaces to the existing nodes based on namespace_statistics
Problem to solve
Some Zoekt nodes may become out of memory when the repositories are not assigned evenly across the different nodes.
Proposal
We can write a simple cron scheduling worker to assign the repositories(namespaces) evenly across the different nodes.
- Iterate over each record
Search::Zoekt::EnabledNamespace
ordered byid
which doesn't have a join record inSearch::Zoekt::Index
. - Pick the Node most free storage.
- Check the storage requirements for the namespace. For example, if it's 100GiB, we take 300GiB (x3). The storage requirement can be found in the
NamespaceStatistics
. - See if assigning this namespace to the node keeps the node under the watermark limit (80%) of storage utilization.
- If yes, then create a record for
Search::Zoekt::Index
- If not make a log entry, and repeat the steps for the next namespace.
We can schedule this cron worker every 10 minutes. Implement the feature behind an ops
feature flag.
Edited by Ravi Kumar