Don't limit number of Gitaly client keepalives
What does this MR do and why?
Long-running RPCs, such as ForkRepository
, may take several hours to complete. While Sidekiq waits for the RPC to complete it should send keepalive pings to Gitaly/Praefect to prevent load balancers from killing the connection. However, the default value for GRPC_ARG_HTTP2_MAX_PINGS_WITHOUT_DATA
is only 2, with pings sent at 5 minute intervals.
As a result, Sidekiq will only send keepalives for the first 5 minutes, then leave the connection idle for up to 6 hours putting long-running RPCs at risk of failure.
This MR sets GRPC_ARG_HTTP2_MAX_PINGS_WITHOUT_DATA
to 0, so Sidekiq can send an unlimited number of keepalives on RPCs in an idle state. Note that pings are still sent at 5 minute intervals with this change.
Screenshots or screen recordings
Before
There is one ping sent by Sidekiq at 13:53:48, then leaves the connection idle. 30 minutes later HAProxy kills the connection:
After
Sidekiq sends a ping every 5 minutes.
How to set up and validate locally
Example below:
- Setup a 3k reference environment with a Gitaly Cluster
- Ensure HAProxy timeout for Praefect is 30 minutes
- Import a large repo into the instance, such as Chromium or LLVM
- Fork the repo, at 35 minutes the fork will fail when HAProxy kills the idle connection
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.