Skip to content

Run SWE benchmark as part of ELI5

Alexander Chueshev requested to merge ac/run-swe-eli5 into main

What does this merge request do and why?

This MR adds support for ELI5 to run the SWE benchmark as part of the duo-workflow evaluate swe command.

Ref: #43

How to set up and validate locally

  1. Check out to this merge request's branch.
  2. Update your .env file.
  3. Install dependencies.
    poetry install --with swebench
  4. Check the existing command ELI5 provides:
    poetry run eli5 duo-workflow evaluate swe --help
  5. Run SWE benchmark with custom evaluators:
    poetry run eli5 duo-workflow evaluate swe results.jsonl --split=base --run-swe-benchmark

Note:

  • DW resutls that can be used to check this MR - results.jsonl
  • Additional instructions for Mac M1 users - #43 (comment 2164440717)
  • If you experience issues with Docker-related Python code, try to update your DOCKER_HOST env variables. For example, DOCKER_HOST=unix:///Users/ac/.colima/default/docker.sock.

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
Edited by Alexander Chueshev

Merge request reports

Loading