Update Duo Code Review prompt based on internal feedback and model evaluation
Context/goal
We've changed the model/prompt recently, and set up local model evaluation.
Based on internal feedback from these changes, we should test various changes to the prompt, validate them with the local model eval, and ship them if they improve the score.
Implementation plan
Prompt changes can be validated using the setup described in #469095 (comment 2032882886).
Ideas for prompt changes can be sourced from:
- Testing the feature and observing pitfalls.
- The list of existing ideas based on the old model/prompt in Improve prompt for Duo Code Review (&14143).
- Feedback we get from internal users of the feature: #463039 (closed).
Note
This issue is intentionally open-ended, as we do not know yet how many prompt changes will make sense. We just know we should expect to make some. Extra capacity is allocated to ensure there is a good buffer to do this work.
Edited by François Rosé (OOO, back 2024-10-29)