Implement a CLI command to evaluate Duo Workflow fix-broken-pipeline with LLM judge (!89) · Merge requests · GitLab.org / AI Powered / ELI5

Alexander Chueshev requested to merge ac/duo-workflow-llm-judge into main Aug 09, 2024

What does this merge request do and why?

This MR implements a Command Line Interface (CLI) command to evaluate the Duo Workflow "fix-broken-pipeline" using an LLM judge. This implementation serves as a foundation for improving evaluation approaches in the ELI5 project.

Note: this MR requires the work done in gitlab-org/duo-workflow/duo-workflow-service!35 (closed)

How to set up and validate locally

Check out to this merge request's branch.
Update your .env file (you can skip DEEPSEEK_API_TOKEN and MISTRAL_API_KEY)
Install dependencies.
```
poetry run install
```

Run help.

poetry run eli5 duo-workflow --help
poetry run eli5 duo-workflow evaluate-fix-broken-pipeline --help

Run evaluation

poetry run eli5 duo-workflow evaluate-fix-broken-pipeline datasets/duo_workflow/fix-broken-pipeline-v1 --dataset=duo_workflow.fix-broken-pipeline.1

Note: This command accepts predictions generated outside of the ELI5 project (see gitlab-org/duo-workflow/duo-workflow-service!35 (closed)). We use the datasets/duo_workflow/fix-broken-pipeline-v1 dataset for demonstration purposes only. All evaluation scores should be 1 as we are comparing the dataset against itself.

Merge request checklist

Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.

Implement a CLI command to evaluate Duo Workflow fix-broken-pipeline with LLM judge

What does this merge request do and why?

How to set up and validate locally

Merge request checklist

Merge request reports