refactor: encapsulate judges for DuoChat (!745) · Merge requests · GitLab.org / ModelOps / AI Model Validation and Research / AI Evaluation / Prompt Library

Hongtao Yang requested to merge encapsulate-judge into main Sep 15, 2024

What does this merge request do and why?

Encapsulate judges for duo-chat. Also plan to do the same for other usecases.

Benefits:

All logic for a judge can be found in one class promptlib/duo_chat/judges.py.
The configurations of each predefined judge (and its different versions) can be easily inspected in the config promptlib/duo_chat/config.py
Decouple judge logic and prompt templating logic from Beam. This improves developer experience and making adding new judges more easily. Here is an example of adding a new judge for DuoChat doc: !746 (merged) , notice how clear it is.
User only need to choose one from a pool of predefined judges, no longer need to build all the models and prompt templates. This will remove a lot of user friction

How to set up and validate locally

Run the pipeline with this config:

Click to expand

{
  "beam_config": {
    "pipeline_options": {
      "runner": "DirectRunner",
      "project": "dev-ai-research-0e2f8974",
      "region": "us-central1",
      "temp_location": "gs://prompt-library/tmp/",
      "save_main_session": false
    }
  },
  "input_source": {
    "type": "bigquery",
    "path": "dev-ai-research-0e2f8974.duo_chat.issue_epic_staging_v1"
  },
  "output_sinks": [
    {
      "type": "bigquery",
      "path": "dev-ai-research-0e2f8974.duo_chat_experiments",
      "prefix": "test_prebuilt_judge"
    }
  ],
  "throttle_sec": 1,
  "batch_size": 16,
  "eval_setup": {
    "answering_models": [
      {
        "name": "duo-chat",
        "parameters": {
          "base_url": "https://staging.gitlab.com"
        },
        "prompt_template_config": {
          "templates": [
            {
              "name": "empty",
              "template_path": "data/prompts/duo_chat/answering/empty.txt.example"
            }
          ]
        }
      },
      {
        "name": "gpt-4o-mini",
        "prompt_template_config": {
          "templates": [
            {
              "name": "claude-3-sonnet",
              "template_path": "data/prompts/duo_chat/answering/code-explanation-simple.txt.example"
            }
          ]
        }
      }
    ],
    "metrics": [
      {
        "name": "similarity_score"
      },
      {
        "name": "independent_llm_judge_generic",
        "model": {
          "name": "claude-3-5-sonnet"
        }
      }
    ]
  }
}

Merge request checklist

I've ran the affected pipeline(s) to validate that nothing is broken.
Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.

Edited Sep 27, 2024 by Hongtao Yang

refactor: encapsulate judges for DuoChat

What does this merge request do and why?

How to set up and validate locally

Merge request checklist

Merge request reports