Add duo chat dataset with issue and epic resources
What does this merge request do and why?
This MR uploads the duo chat dataset (https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/datasets/-/commit/92c9532597217cfeb31100394edb515598fc3f1f#863878c628166a74b7260245615fbf87da138344) with issues and epics to Langsmith that was accidentally removed from the dataset repo - https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/datasets.
Please note that this MR slightly updates the schema without affecting our RoR Rake task:
- we merge context (resource dump) with queries
- we store context (resource dump) as a dictionary instead of string now. We need more flexibility to apply different eval strategies later.
Here is the uploaded dataset - https://smith.langchain.com/o/477de7ad-583e-47b6-a1c4-c4a0300e7aca/datasets/f0f7c18a-a282-465b-8f16-d5b763365ec4?tab=2&paginationState=%7B%22pageIndex%22%3A0%2C%22pageSize%22%3A10%7D
How to set up and validate locally
- Check out to this merge request's branch.
- Set env variables in
.env
- Install dependencies.
poetry run install
- Check the existing commands ELI5 provides:
poetry run eli5 datasets --help
- Run:
poetry run datasets create duo_chat.cot_qa_resources.1 datasets/duo_chat/resources_v1/ad308c3710ad469faeb587aa94ba0876.jsonl
Related to #25 (closed)
Merge request checklist
-
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.