De-duplicate binary lookups for identical commit diff paths
What does this MR do?
In https://gitlab.com/gitlab-org/gitlab/-/issues/326316 we are looking to improve performance of the commit endpoint, which renders diffs for potentially many files. One offender we found was a function that probes into the git blob data to decide whether the file is binary or not. This function takes about 50% of CPU time spent in diff file rendering. We are already looking to cache these calls in !60128 (merged) and are also considering to push this detection down into Gitaly in #329203, however:
- the fix in !60128 (merged) will only help when this redis backed cache is warm; initial rendering time for a commit diff is not helped by it
- extracting this logic from Rails is a larger effort and unclear if or when it will happen
As a short-term mitigation we can try to at least skip unnecessary lookups, which this MR tries to accomplish.
This optimization here reduces the number of times we need to call into this function by leveraging the fact that when computing diffs, we need to fetch all files twice using different revisions. Since the chance that a given file changes its type ("binary-ness") across two revisions is remote at best (though not impossible), we cache this lookup by path in-memory now so that for subsequent gitaly fetches of the same file, we can re-use the previous lookup result.
We need to understand that this MR accepts a trade-off between a slim chance of being incorrect for a large gain in performance. For instance, should a file change its type from e.g. text to PNG or vice versa across two revisions, we will have cached the wrong result. However, this would be immediately fixed with a page reload because the results are only cached per-request (per iteration of a gitaly response even.)
Results
As expected, this cuts call count and CPU time spent in this detection in half, from ~50% to ~25% of all CPU time spent in rendering diffs:
Before
See https://gitlab.com/gitlab-org/gitlab/-/issues/326316#note_558912202
After
Does this MR meet the acceptance criteria?
Conformity
-
📋 Does this MR need a changelog?-
I have included a changelog entry. -
I have not included a changelog entry because _____.
-
-
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Related to #326316