Clean up duo-chat eval prompt by removing examples
What does this merge request do and why?
This MR cleans up the duo-chat eval prompt by removing all the examples, because we've seen some prompt contamination with the few-shot prompts: #543
This MR also upgrade the evaluating model to claude-3.5 from text-bison
Partially resolves #543
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
Merge request checklist
-
I've ran the affected pipeline(s) to validate that nothing is broken. -
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.