Create a `/chat` endpoint that gitlab-rails will use for chat (using Anthropic)
Proposal
In https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/312#api-for-chat-endpoint we outlined an API endpoint for Duo Chat, with the expected request and response payload. This will be the actual implementation issue. We will create a new endpoint /chat
that gitlab-rails will call. The endpoint will call the Anthropic model provider with the prompt, and return the chat response.
In a first iteration we will not support streaming, this will be done in follow-up issue: #318 (closed)
Approach
Originally discussed here.
- We will pursue an approach that combines endpoint versioning with schema definitions rather than taking out versioning entirely. The shape of the accepted payload by this endpoint version will be defined in Python data structures via pydantic, which we can use to emit JSON schema files.
- We will provide a Chat specific endpoint such as
/v1/chat
for now. It may be too early to define generic stable endpoints like/v1/prompt
(and similarly generic schemas) that could potentially serve many uses cases. Arguments against this were:- For some use cases, we run post-processing logic in the AI gateway specific to this use case, which would not be captured by a generic endpoint.
- Since prompts are constructed in Rails, and since prompts are very specific to a model they were built for, while we do allow to specify the model in the payload in this proposal here, we may need to use a prompt registry to re-map these in case we drop support for a particular model, and it's unclear still how that will work. We do not want to wait for this decision since that would block the Chat release.
- We will work with
@tle_gitlab
to decide how exactly versioning for endpoints is encoded in the app since Tan was the original author of the current approach incl. directory structure used.
Links / references
Edited by Matthias Käppler