Return ResourceExhausted instead of Internal for Spawn token timeout
For #5096 (closed)
This MR returns ResourceExhausted instead of Internal for Spawn token timeout. The official gRPC documentation (https://grpc.github.io/grpc/core/md_doc_statuscodes.html) clearly distinguishes between different status codes, in particular:
-
RESOURCE_EXHAUSTED
: Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. -
INTERNAL
: Internal errors. This means that some invariants expected by the underlying system have been broken. This error code is reserved for serious errors.
The spawn token system, developed by our team, serves as a means of managing underlying fork/exec operations. When a process is unable to create due to spawn token shortage, it can be viewed as a resource issue, aligning with the definition of the ResourceExhausted error code. This type of error is common in comparable scenarios, making it a predictable outcome. It would be appropriate to reserve the Internal error code for unexpected occurrences instead.
One more thing. In Allow Gitaly to push back on traffic surges (&7891 - closed), I'm currently implementing a pushback feature for clients who encounter specific error codes. By converting these errors to ResourceExhausted, clients will be forced to perform transparent retries in an exponential and automatic manner. This will ultimately have a positive impact on the system.
How would you be able to verify this change?
- Set spawn token environment variables to extremely low:
GITALY_COMMAND_SPAWN_TIMEOUT
to 100ms,MaxParallel
= 1. - Stress-test Gitaly via API until an error is returned
Before | Ater |
---|---|
Gitaly returns the Internal response code |
Gitaly returns the ResourceExhausted response code |
gRPC logs don't include spawn token error, although the final response error includes a portion. There is another dedicated log line for this kind of error |
The dedicated log line is removed. It's now merged into gRPC logs |
API returns 503 Unavailable. It's due to Gitaly returning the Internal code. |
API returns 429 status with a nice message. It's the consequence of Propagate Gitaly ResourceExhausted errors to cl... (gitlab!119054 - merged) |