Skip to content

nodes: Set connection backoff MaxDelay to 1 second

Will Chandler (ex-GitLab) requested to merge wc/praefect-connection-backoff into master

gRPC clients use an exponential backoff strategy for re-establishing connections, meaning that the longer a connection has been in a bad state the greater the delay before the client will make its next connection attempt. This is useful in scenarios where a very large number of clients could trigger a thundering herd effect on a server as it returns to service.

In a Gitaly Cluster, this means that in cases where a Gitaly node is down for some time and a large connection backoff has been set, Praefect may wait to try to connect for up to 120 seconds. This causes Gitaly nodes to remain unavailable longer than necessary.

The issues addressed gRPC's default exponential backoff behavior do not apply in this scenario as we will always have a small number of clients (Praefect nodes), and the volume of traffic from healthchecks is dwarfed by normal production load.

To resolve this, set the maximum backoff delay to one second.

Fixes #3923 (closed)

Merge request reports

Loading