Support streaming in Chat API (with LangChain)
What does this merge request do and why?
This MR adds streaming support to the v1/agent/chat
endpoint.
We use LangChain for both JSON response and streaming response. This way we can take an advantage of Python / LangChain community that allows us to use the battle-tested tools.
See LCEL for more LangChain abstract interfaces.
This also fixes the technical debts that chat content is passed to the model.generate
as prefix
and _suffix
, which is misleading.
- GitLab-Rails counter-part: Ai Gateway client for Duo Chat (gitlab-org/gitlab!138274 - merged)
- Related Build a client for AI Gateway to connect duo chat (gitlab-org/gitlab#431563 - closed)
How to set up and validate locally
- Run AI Gateway
poetry run ai_gateway
With streaming:
shinya@shinya-XPS-15-9530:~/gitlab-development-kit$ curl -v -N -X 'POST' \
'http://0.0.0.0:5052/v1/chat/agent' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"prompt_components": [
{
"type": "string",
"metadata": {
"source": "string",
"version": "string"
},
"payload": {
"content": "string",
"provider": "anthropic",
"model": "claude-2.0"
}
}
],
"stream": "True"
}'
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying 0.0.0.0:5052...
* Connected to 0.0.0.0 (127.0.0.1) port 5052
> POST /v1/chat/agent HTTP/1.1
> Host: 0.0.0.0:5052
> User-Agent: curl/8.4.0
> accept: application/json
> Content-Type: application/json
> Content-Length: 292
>
< HTTP/1.1 200 OK
< date: Fri, 01 Dec 2023 04:56:31 GMT
< server: uvicorn
< x-process-time: 0.12322370000038063
< x-request-id: 5fc8362aa2464326ac25e05a76651e00
< transfer-encoding: chunked
<
I apologize, I should not have made assumptions about your preferences. Let's move our conversation in a more positive direction.* Connection #0 to host 0.0.0.0 left intact
shinya@shinya-XPS-15-9530:~/gitlab-development-kit$
(Notice that transfer-encoding: chunked
)
Without streaming:
shinya@shinya-XPS-15-9530:~/gitlab-development-kit$ curl -v -X 'POST' \
'http://0.0.0.0:5052/v1/chat/agent' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"prompt_components": [
{
"type": "string",
"metadata": {
"source": "string",
"version": "string"
},
"payload": {
"content": "string",
"provider": "anthropic",
"model": "claude-2.0"
}
}
],
"stream": false
}'
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying 0.0.0.0:5052...
* Connected to 0.0.0.0 (127.0.0.1) port 5052
> POST /v1/chat/agent HTTP/1.1
> Host: 0.0.0.0:5052
> User-Agent: curl/8.4.0
> accept: application/json
> Content-Type: application/json
> Content-Length: 291
>
< HTTP/1.1 200 OK
< date: Fri, 01 Dec 2023 04:57:18 GMT
< server: uvicorn
< content-length: 394
< content-type: application/json
< x-process-time: 3.4186516540012235
< x-request-id: 739997b544df40649c2f96e922557f25
<
* Connection #0 to host 0.0.0.0 left intact
{"response":" I'm afraid I don't have enough context to determine if that statement is racist or not. Making broad generalizations about groups of people based on race is generally not advised. Perhaps it would be better to judge people as individuals based on their character and actions rather than their race.","metadata":{"provider":"anthropic","model":"claude-2.0","timestamp":1701406642}}shinya@shinya-XPS-15-9530:~/gitlab-development-kit$
Merge request checklist
-
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.
Edited by Shinya Maeda