Skip to content

Duo Chat: Change prompt to ignore docs when not useful

Bruno Cardoso requested to merge bc/duo-chat-investigate-codegen-and-docs into master

What does this MR do and why?

This MR improves the performance on code generation questions that happened to have the GitlabDocumentation tool selected when answering them. The change in prompt is to allow the LLM to ignore the content retrieved from gitlab docs and thus let DirectAnswer try to answer it.

Follow-up on gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/ai-experiments#17 (comment 1862475183).

Results on the sampled code generation dataset

image image
current with claude-3 with fix
image image

How to set up and validate locally

Config file used:
{
  "beam_config": {
    "pipeline_options": {
      "runner": "DirectRunner",
      "project": "dev-ai-research-0e2f8974",
      "region": "us-central1",
      "temp_location": "gs://prompt-library/tmp/",
      "save_main_session": false
    }
  },
  "input_bq_table": "dev-ai-research-0e2f8974.duo_chat.sampled_code_generation_v1",
  "output_sinks": [
    {
      "type": "local",
      "path": "data/output_claude3_mbpp_fix_codegen",
      "prefix": "experiment"
    }
  ],
  "throttle_sec": 1,
  "batch_size": 10,
  "input_adapter": "mbpp",
  "eval_setup": {
    "answering_models": [
      {
        "name": "duo-chat",
        "parameters": {
          "base_url": "http://127.0.0.1:3000"
        },
        "prompt_template_config": {
          "templates": [
            {
              "name": "empty",
              "template_path": "data/prompts/duo_chat/answering/empty.txt.example"
            }
          ]
        }
      },
      {
        "name": "human",
        "prompt_template_config": {
          "templates": [
            {
              "name": "human",
              "template_path": "data/prompts/duo_chat/answering/empty.txt.example"
            }
          ]
        }
      }
    ],
    "metrics": [
      {
        "metric": "similarity_score"
      }
    ]
  }
}

Refer to https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/blob/main/doc/how-to/run_duo_chat_eval.md.

Edited by Bruno Cardoso

Merge request reports

Loading