Reduce risk of AI service calls blocking Puma threads
In #418203 (closed) we found that in case of a performance degradation in the model gateway we put the entire server at risk of responding slow to any other requests. This is especially problematic in self-managed deployments since an incident in the model gateway (run by us) will "trickle down" into all customer deployments using code suggestions.
We should look into solution for how to improve fault tolerance in case of degraded AI gateway performance. We discussed two options so far:
- Setting stricter timeouts when making HTTP calls into the model gateway: !126852 (closed)
- Rewriting the API endpoint to use Workhorse
sendurl
instead. This would offload the HTTP request into workhorse so that the Puma worker can return early and process other requests: !126957 (merged)
Edited by Matthias Käppler