Collection and Tracking of Evaluation Outputs

Problem Statement

Each evaluation run is subject to some randomness, and currently, we do not collect and track the evaluation outputs separately, making it difficult to aggregate evaluation runs over time and gain useful insights or track progress.

Exit Criteria

A system to collect the evaluation outputs and track them separately has been implemented, providing useful insights and progress indicators.

Edited Oct 27, 2023 by David O'Regan