Run SWE benchmark as part of ELI5 (!205) · Merge requests · GitLab.org / AI Powered / ELI5

Alexander Chueshev requested to merge ac/run-swe-eli5 into main Oct 18, 2024

What does this merge request do and why?

This MR adds support for ELI5 to run the SWE benchmark as part of the duo-workflow evaluate swe command.

Ref: #43

Check the existing command ELI5 provides:

poetry run eli5 duo-workflow evaluate swe --help

Run SWE benchmark with custom evaluators:

poetry run eli5 duo-workflow evaluate swe results.jsonl --split=base --run-swe-benchmark

Note:

DW resutls that can be used to check this MR - results.jsonl
Additional instructions for Mac M1 users - #43 (comment 2164440717)
If you experience issues with Docker-related Python code, try to update your DOCKER_HOST env variables. For example, DOCKER_HOST=unix:///Users/ac/.colima/default/docker.sock.

Edited Oct 18, 2024 by Alexander Chueshev