Reduce risk of AI service calls blocking Puma threads

In #418203 (closed) we found that in case of a performance degradation in the model gateway we put the entire server at risk of responding slow to any other requests. This is especially problematic in self-managed deployments since an incident in the model gateway (run by us) will "trickle down" into all customer deployments using code suggestions.

We should look into solution for how to improve fault tolerance in case of degraded AI gateway performance. We discussed two options so far:

Setting stricter timeouts when making HTTP calls into the model gateway: !126852 (closed)
Rewriting the API endpoint to use Workhorse sendurl instead. This would offload the HTTP request into workhorse so that the Puma worker can return early and process other requests: !126957 (merged)

Edited Jul 19, 2023 by Matthias Käppler