Configure retries for read-only Gitaly RPCs
What does this MR do and why?
Currently our Ruby Gitaly gRPC client will automatically perform 'transparent' retries, where the request has reached gRPC's internal loadbalancer but not gone onto the wire. However, gRPC can retry requests in a larger number of scenarios when configured to do so.
Add a retryPolicy
to Gitaly's service_config to allow retries for any
read-only RPC that fails with an UNAVAILABLE
status code. This status
code used exclusively to indicate a connection/network failure. We
allow up to two additional requests to be sent, at 250ms and 500ms
intervals. The overall request deadline is still honored as well. This
allows the client to handle momentary service interruptions without
bubbling errors up to users.
Gitaly sets a MethodOption
named op_type
on RPCs to indicate which
ones will modify the repository. This is accessible to Golang clients,
but with Ruby we are unfortunately forced to manually list RPCs known to
be read-only as the Ruby protobuf implementation does not support
accessing MethodOption
s. gRPC-Core issue # 1198 is open to track
adding this feature.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
- Stop your GDK's Gitaly with
gdk stop gitaly
- Load a rails console session and execute a
FindCommit
withProject.last.repository.commit('HEAD')
- Immediately start Gitaly with
gdk start gitaly
. If Gitaly starts fast enough, the request will succeed with output like=> #<Commit id:603975295665c2601289682bd3eefe92da22f848 i-user-0-1696879720/lab-coat@603975295665c2601289682bd3eefe92da22f848>
- If you have trouble getting the timing right, increasing
maxAttempts
andmaxBackoff
may help.
- If you have trouble getting the timing right, increasing
Note that if you are using Praefect this adds additional delay. Stopping Praefect only will be easier.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.