Identify Full Diff Files
What does this MR do?
For #17531 (closed).
We need to be able to store "reviews" at a frozen point in time for a given file in a given MR.
See the designs on the linked issue, plus this initial review, plus this UX thread.
A "point in time" in git
terms is essentially at a commit, which is why we're hashing both the file_identifier_hash
and the content_sha
to get the ID for each diff file.
New Behavior
Full diff files (that is: diff files loaded for display via the diffs endpoint diffs_batch.json
) are assigned a unique identifier.
Branching Logic
Our central diff file processor (prepareDiffData
) is used to prepare both the diff files and the diff metadata. Unfortunately, the "files" present in the diff metadata are not really complete "files" and so can't actually be uniquely identified.
To avoid errors or assigning non-unique IDs, if prepareDiffData
is called with metadata, it skips assigning IDs.
Notably, the files processed when the metadata is loaded are never set to state.
Files processed during a normal diff load (e.g., not metadata), are set to state, so we can conveniently assume that diff files in state have unique identifiers.
This is just a flag passed by the caller action, so other potential branches can use this flag in the future.
Known Issues
Because we have to bail out on identifying files when they're metadata files, we can't make the blanket statement that all diff files have these IDs.
Similarly, because we can't guarantee every file has an ID assigned (e.g., if they don't have a content_sha
), we also need to double-check that it exists before depending on it.
On another similar note, because we don't currently have a way of identifying a file based on the last time it changed, these IDs change with every commit and become slightly less useful.
Further detail
If we had a way to identify a file based on which commits it changes in, the ID would be consistent until the file was modified which is probably more useful. Example:
- File A is modified in commit 1
- File A is assigned ID
123abc
- The branch has more commits added, commits 2, 3, 4, 5, but none touch File A
- In each of the MR versions (whether they're pushed individually or all at once), File A continues to have ID
123abc
, because none of the later commits modified it. - Commit 6 - which modifies File A - is pushed to the branch
- File A is assigned a new ID:
234def
In this example, an unchanged file retains a consistent ID until it's modified. For most use-cases, this is ideal.
For situations where the ID must be truly unique, the file ID can be combined with the head_sha
to create something unique in every commit in the MR.
None of these known issues break this feature, but they will be something that we have to address in the future.
Most notably:
we don't currently have a way of identifying a file based on the last time it changed
If we implement a more robust identifying algorithm, we may invalidate many of the old-style IDs and - as one example - inadvertently clear any in-progress reviews.
Alternatively, we could implement an "upgrade migration" which checks the old style and upgrades to the new style dynamically, but this is a lot of extra work.
Review Recommendations and Overview
I recommend taking a look at the commits from oldest to newest, but not reviewing there.
The work is broken up into 4 commits:
- The oldest commit implements the basics of the new behavior: it creates an ID for diff files.
- The second commit updates the preparation code to skip that ID in the case of metadata, and - in the bulk of the changes for this MR - updates the call signature to use an object instead of ordered parameters.
- The third commit updates the test suite for this utility file to:
- Test the newly added ID and - critically - make sure it's not added for metadata files
- Use a getter for the local test diff files since the previous version maintained mutated state between test runs.
- The most recent commit adds the additional check to exclude the ID if the file doesn't have a
content_sha
- The additional test for this feature is also included in this commit
Timeline
Merge Request | Description |
---|---|
!49173 (merged) !49174 (merged) !49190 (merged) !49208 (merged) |
Various cleanup merge requests in preparation for this feature. |
We're here |
Identify diff files using the content_sha
|
Switch identification to use blob.id
|
|
Add tooling to add/remove reviews from LocalStorage to accommodate the file review feature | |
Add UI to review files |
Screenshots (strongly suggested)
N/A, this is all developer-facing work
Does this MR meet the acceptance criteria?
Conformity
- [-] Changelog entry
- [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides - [-] Database guides
- [-] Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers - [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team