Skip to content

fix: do not use codestral in asia-* locations

Pam Artiaga requested to merge 635-use-code-gecko-in-apac into main

What does this merge request do and why?

We introduced Codestral in feat: add Completions agent for codestral on ve... (!1172 - merged). However, after latency testing post-deployment, we discovered that there is a significant different in the latencies for requests coming from APAC locations:

Requests coming from APAC

Model AI Gateway Location Model Location P50 P95 P99
Code Gecko asia-northeast1 asia-northeast1 566.3 676.05 715.67
Codestral asia-northeast1 us-central1 1668.76 2034.84 2072.16

Requests coming from USA

Model AI Gateway Location Model Location P50 P95 P99
Code Gecko us-east4 us-east4 530.68 607.27 618.34
Codestral us-east4 us-central1 737.23 1057.63 2902.27

We are currently in the process of requesting support for Codestral in asia-* locations (see gitlab-org/gitlab#485915 (closed)).

In the mean time, we would like to continue rollout to internal users. However, the difference in APAC-based latencies is prohibitively large and could impede productivity for internal users. In order for us to continue with rollout, we think it's best to disable Codestral if the AIGW instance is running in asia-* locations.

We cannot do this check in the Rails monolith since AIGW and Rails are not hosted in the same GCP instance.

How to set up and validate locally

Not in Asia

  1. The default GCP location of your local AIGW instance should be us-central1. See the ai_gateway/config.py -> _build_location function

  2. Send the following request:

    curl "http://gdk.test:5052/v2/code/completions" \
    -X POST \
    --header "Content-Type: application/json" \
    --data "{\"current_file\":{\"content_below_cursor\":\"end\",\"file_name\":\"hello.rb\",\"language_identifier\":\"go\",\"content_above_cursor\":\"def hello_\"},\"stream\":false,\"prompt_version\":1,\"model_provider\":\"vertex-ai\",\"model_name\":\"codestral@2405\"}" \
    | json_pp -json_opt pretty,canonical
  3. You should get a response indicating that the provider/model used is vertex_ai/codestral@2405:

      {
         "choices" : [...],
         ...other response fields...
         "model" : {
            "engine" : "vertex-ai",
            "lang" : "ruby",
            "name" : "vertex_ai/codestral@2405",
            ...,
         },
         ...
      }

In Asia

  1. Simulate the GCP location of your local AIGW to asia-northeast1 by changing the returned value in ai_gateway/config.py -> _build_location function.

  2. Send the following request:

    curl "http://gdk.test:5052/v2/code/completions" \
    -X POST \
    --header "Content-Type: application/json" \
    --data "{\"current_file\":{\"content_below_cursor\":\"end\",\"file_name\":\"hello.rb\",\"language_identifier\":\"go\",\"content_above_cursor\":\"def hello_\"},\"stream\":false,\"prompt_version\":1,\"model_provider\":\"vertex-ai\",\"model_name\":\"codestral@2405\"}" \
    | json_pp -json_opt pretty,canonical
  3. You should get a response indicating that the provider/model used is code-gecko@002:

      {
         "choices" : [...],
         ...other response fields...
         "model" : {
            "engine" : "vertex-ai",
            "lang" : "ruby",
            "name" : "code-gecko@002",
            ...
         },
         ...
      }

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.

Closes #635 (closed)

Edited by Pam Artiaga

Merge request reports

Loading