Make Geo::RepositorySyncWorker and Geo::FileDownloadDispatchWorker max_capacity configurable
Per https://gitlab.com/gitlab-org/gitlab-ee/issues/3453#note_41323317
A critical number for controlling the amount of strain a Geo secondary places on its primary is Geo::RepositorySyncWorker#max_capacity
.
This value (and the equivalents for the other subclasses of Geo::BaseSchedulerWorker
) determines how many concurrent requests the secondary will make to the primary, particularly in the context of backfilling repositories that don't yet exist on the secondary.
We should make the value configurable. A first iteration is just to add columns in the geo_nodes
table so this can be tweaked per-secondary.
(overly complicated stretch stuff follows)
Currently it's set to a hardcoded limit - 25 - but since network architectures and capacities vary so much, we should consider making this number either configurable or adaptive.
Since the Geo secondary has a read-only database, I think it makes sense to turn this into a column in the geo_nodes
table and have it managed from the primary. In particular, if you've got more than one secondary, you might want to prioritize the backfill of one over the other, so it makes sense to centrally manage these numbers.
Perhaps we can manage this rather abstractly. If the primary declares its available capacity as a concrete number - either through configuration or introspection - then each secondary can be assigned a proportion of that capacity, perhaps using beautiful sliders. The sum of capacities doesn't have to reach 100%, but it would not be permitted to exceed 100%.