Skip to content

Add a UUID to each Diff File when the raw data is processed

What does this MR do?

For #33867.

Diff Files don't have a unique identifier. This MR adds one to diff files on the front end using a combination of values that - together - are always unique.

  • file.blob.id = SHA1 of the git blob, which is sometimes unique, but not frequently enough (in one MR with 6 files, this was duplicated once)
  • file.diff_refs.{base,start,head}_sha = base_sha & start_sha are often identical across many MRs, and head_sha will only be unique in a given MR - but not unique for any file in that MR
  • file.file_identifier_hash = SHA1 of ${file_path}-${new}-${deleted}-${renamed}, should be unique in a given MR, but no uniqueness in a project / across MR versions
  • file.blob.mode = Never unique, just the file mode number

All six of these are used to get a unique ID for a diff file. By combining blob.id and {base,start,head}_sha we should be able to roughly pinpoint the commit and source file we're dealing with. By combining file_identifier_hash and blob.mode we should be able to identify a certain iteration of that source file in that commit.

Together, file_identifier_hash and blob.mode identify a diff file uniquely within a single MR.
Together, blob.id and diff_refs.{base,start,head}_sha identify a given source file across any MR.

Both of those combinations uniquely identify any diff file across any MR.

Screenshots

N/A, all ~backstage

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team

Merge request reports

Loading