fix: do not use codestral in asia-* locations
What does this merge request do and why?
We introduced Codestral in feat: add Completions agent for codestral on ve... (!1172 - merged). However, after latency testing post-deployment, we discovered that there is a significant different in the latencies for requests coming from APAC locations:
Requests coming from APAC
Model | AI Gateway Location | Model Location | P50 | P95 | P99 |
---|---|---|---|---|---|
Code Gecko | asia-northeast1 |
asia-northeast1 |
566.3 | 676.05 | 715.67 |
Codestral | asia-northeast1 |
us-central1 |
1668.76 | 2034.84 | 2072.16 |
Requests coming from USA
Model | AI Gateway Location | Model Location | P50 | P95 | P99 |
---|---|---|---|---|---|
Code Gecko | us-east4 |
us-east4 |
530.68 | 607.27 | 618.34 |
Codestral | us-east4 |
us-central1 |
737.23 | 1057.63 | 2902.27 |
We are currently in the process of requesting support for Codestral in asia-*
locations (see gitlab-org/gitlab#485915 (closed)).
In the mean time, we would like to continue rollout to internal users. However, the difference in APAC-based latencies is prohibitively large and could impede productivity for internal users. In order for us to continue with rollout, we think it's best to disable Codestral if the AIGW instance is running in asia-*
locations.
We cannot do this check in the Rails monolith since AIGW and Rails are not hosted in the same GCP instance.
How to set up and validate locally
Not in Asia
-
The default GCP location of your local AIGW instance should be
us-central1
. See theai_gateway/config.py
->_build_location
function -
Send the following request:
curl "http://gdk.test:5052/v2/code/completions" \ -X POST \ --header "Content-Type: application/json" \ --data "{\"current_file\":{\"content_below_cursor\":\"end\",\"file_name\":\"hello.rb\",\"language_identifier\":\"go\",\"content_above_cursor\":\"def hello_\"},\"stream\":false,\"prompt_version\":1,\"model_provider\":\"vertex-ai\",\"model_name\":\"codestral@2405\"}" \ | json_pp -json_opt pretty,canonical
-
You should get a response indicating that the provider/model used is
vertex_ai/codestral@2405
:{ "choices" : [...], ...other response fields... "model" : { "engine" : "vertex-ai", "lang" : "ruby", "name" : "vertex_ai/codestral@2405", ..., }, ... }
In Asia
-
Simulate the GCP location of your local AIGW to
asia-northeast1
by changing the returned value inai_gateway/config.py
->_build_location
function. -
Send the following request:
curl "http://gdk.test:5052/v2/code/completions" \ -X POST \ --header "Content-Type: application/json" \ --data "{\"current_file\":{\"content_below_cursor\":\"end\",\"file_name\":\"hello.rb\",\"language_identifier\":\"go\",\"content_above_cursor\":\"def hello_\"},\"stream\":false,\"prompt_version\":1,\"model_provider\":\"vertex-ai\",\"model_name\":\"codestral@2405\"}" \ | json_pp -json_opt pretty,canonical
-
You should get a response indicating that the provider/model used is
code-gecko@002
:{ "choices" : [...], ...other response fields... "model" : { "engine" : "vertex-ai", "lang" : "ruby", "name" : "code-gecko@002", ... }, ... }
Merge request checklist
-
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.
Closes #635 (closed)