Skip to content

Add duo chat dataset with issue and epic resources

Alexander Chueshev requested to merge ac/dataset-duo-chat-resources into main

What does this merge request do and why?

This MR uploads the duo chat dataset (https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/datasets/-/commit/92c9532597217cfeb31100394edb515598fc3f1f#863878c628166a74b7260245615fbf87da138344) with issues and epics to Langsmith that was accidentally removed from the dataset repo - https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/datasets.

Please note that this MR slightly updates the schema without affecting our RoR Rake task:

  • we merge context (resource dump) with queries
  • we store context (resource dump) as a dictionary instead of string now. We need more flexibility to apply different eval strategies later.

Here is the uploaded dataset - https://smith.langchain.com/o/477de7ad-583e-47b6-a1c4-c4a0300e7aca/datasets/f0f7c18a-a282-465b-8f16-d5b763365ec4?tab=2&paginationState=%7B%22pageIndex%22%3A0%2C%22pageSize%22%3A10%7D

How to set up and validate locally

  1. Check out to this merge request's branch.
  2. Set env variables in .env
  3. Install dependencies.
    poetry run install
  4. Check the existing commands ELI5 provides:
    poetry run eli5 datasets --help
  5. Run:
    poetry run datasets create duo_chat.cot_qa_resources.1 datasets/duo_chat/resources_v1/ad308c3710ad469faeb587aa94ba0876.jsonl

Related to #25 (closed)

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.

Merge request reports

Loading