Enable Private Cloud-hosted Models as Self-Managed Custom Models
Overview
Enable customers want to use a the same model (Vertex or Anthropic) used by the .com feature with a model deployed in a enterprise's cloud space at GCP, Azure or AWS.
This would require adding configuration for self-hosted models that picks up the .com prompts and points to the same type of model in a private cluster.
PoC
Basically, the functionality that we want to support here can be done even now with litellm-proxy as long as we support the prompt for the specified model. For example:
- Deploy the model in Model Garden: https://cloud.google.com/model-garden?hl=en. For example,
Codestral
- Configure litellm proxy, something like:
model_list:
- model_name: codestral
litellm_params:
model: vertex_ai/codestral@2405
vertex_ai_project: idrozdov-caf8e304
vertex_ai_location: us-central1
litellm --config config.yaml --detailed_debug
Now, when we request a codestral model via proxy as:
curl -X POST http://localhost:4000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "codestral", "messages": [{ "role": "user", "content": "Hello!" } ] }'
We receive a response.
- Configure a self-hosted model as:
- Now a feature is powered by a Cloud-hosted model
Limitations
- We want to reuse the Anthropic prompts that we already have to support Anthropic-Vertex
- We don't want a customer to set up Litellm proxy
Proposal
After trying to configure the proxy, it seems that specifying the provider is not enough, we also need to know what name must be specified to identify a particular model. For example, we allow setting Codestral
as a model and we send codestral
term as a model name; however, vertex-ai expects codestral@2405
. So we either need to resolve this naming internally (for example, we know that for vertex_ai/codestral
, we need to send vertex_ai/codestral@2405
) or we can allow a customer to specify all this information by themselves. We could either have two fields (Provider
for vertex_ai
and Served model name
for codestral@2405
), but maybe we can have a single field and ask a customer to specify it in this format: https://docs.litellm.ai/docs/providers/vertex#vertex_ai-route.
That would also solve the problem that was once mentioned in Slack: a customer wants to specify the served model name because their server expects a name different from the one that we're sending.