Skip to content

feat: support model_provider argument in code suggestions evaluation

Pam Artiaga requested to merge pam/update-ai-gateway-client into main

What does this merge request do and why?

  • This adds a model_provider argument in the code suggestions evaluation that will be used for the AI Gateway client/source
  • This also changes the default intent value to completion. I think we will be mostly testing completion (especially with evaluations against AI Gateway).

How to set up and validate locally

Run the following:

poetry run eli5 code-suggestions evaluate \
--dataset="code-suggestions-input-testcases-v1" \
--source=ai-gateway \
--experiment-prefix=aigw-codegecko \
--model-name=code-gecko@002 \
--model-provider=vertex-ai \
--rate-limit=29 \
--limit=1 \
--evaluate-with-llm

The evaluation should run successfully.

In your AIGW logs, you should see a request coming in from vertex-ai/code-gecko (line breaks added for readability and clarity):

2024-08-21_06:25:59.76118 gitlab-ai-gateway     : 2024-08-21 14:25:59 [info     ] 172.16.123.1:49819 - 
"POST /v2/code/completions HTTP/1.1" 200 blocked=False client_ip=172.16.123.1 client_port=49819 
correlation_id=3d3f475d27f7413f80b568da2c0b7ec5 cpu_s=0.010920999999996184 
duration_request=-1 duration_s=0.6571560000011232 
editor_lang=None experiments=[{'name': 'exp_truncate_suffix', 'variant': 0}] 
gitlab_duo_seat_count=None gitlab_global_user_id=None gitlab_host_name=None gitlab_instance_id=None 
gitlab_language_server_version=None gitlab_realm=None gitlab_saas_duo_pro_namespace_ids=None gitlab_saas_namespace_ids=None 
gitlab_version=None http_version=1.1 inference_duration_s=0.6488190839954768 lang=php 

# these are the relevant details to watch out for
# model-engine is equal to the model_provider
meta.feature_category=code_suggestions method=POST 
model_engine=vertex-ai model_name=code-gecko@002 
model_output_length=3 model_output_length_stripped=3 
model_output_score=-1.942774772644043 
path=/v2/code/completions 

# other details
post_processing_duration_s=0.0006997079981374554 
prompt_length=110 prompt_length_stripped=94 prompt_symbols={} 
status_code=200 suffix_length=0 
url=http://gdk.test:5052/v2/code/completions 
user_agent=python-requests/2.32.3

Here is an example result of the above evaluation command: click to see the Langsmith experiment result.

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
Edited by Pam Artiaga

Merge request reports

Loading