Fix bug in #git_diff_prefix regex
What does this MR do and why?
While extracting regexes into feature-specific modules, I noticed that
we had a divergence between use cases when it came to diff headers, so I isolated #git_diff_prefix
and
#generate_commit_message_git_diff_prefix
to complete the refactor,
suspecting that one or the other had a bug. Sure enough,
#git_diff_prefix
wasn't able to capture some git headers, while
generate_commit_message_git_diff_prefix
was more flexible.
This sort of bug is exactly the reason why refactoring the regexes and unifying them into a single location makes more sense than each individual class having their own declaration.
This bug in the regex was very much a P4; it wasn't causing any user-facing impact, nor was it substantially impacting the results, it was merely causing some minor inefficiency in AI API requests, as we strip the diff prefix string to save tokens/bandwidth used in the API call.
Example showing previous regex not performing as expected:
[1] pry(#<Llm::MergeRequests::SummarizeDiffService>)> diff.diff
=> "@@ -0,0 +1 @@\n+this branch is used to test Repository#merged_to_root_ref?\n"
[2] pry(#<Llm::MergeRequests::SummarizeDiffService>)> diff.diff.sub(Gitlab::Regex.git_diff_prefix, "")
=> "@@ -0,0 +1 @@\n+this branch is used to test Repository#merged_to_root_ref?\n"
[3] pry(#<Llm::MergeRequests::SummarizeDiffService>)> diff.diff.sub(Gitlab::Regex.generate_commit_message_git_diff_prefix, "")
=> "\n+this branch is used to test Repository#merged_to_root_ref?\n"
[4] pry(#<Llm::MergeRequests::SummarizeDiffService>)>