Skip to content

Remove LegacyDiffNote usage for Github import

What does this MR do and why?

MR covers

Update

The new issue was created to remove LegacyDiffNote usage GitHub Import: Switch from LegacyDiffNote to Di... (#411918), the original issue will be handled separately by UI changes.

Details below

For the testing purpose, I was using rspec/rspec-core and rspec/rspec-mocks repos.

GitHub Importer already supports note creation with DiffNote, but only notes with suggestions. It was implemented in the scope of GithubImport: Enable users to import modern diff notes

During implementation couple of issues appeared after switching to DiffNote (with an existing implementation for notes with suggestions)

Problems are described in the comment here, the next items are about how it was fixed and why:

Summary

The main issue is with importing Comments that are referencing commits that no longer exist (were overridden by force-pushes). To show diff on UI for DiffNote two commits (head and base) are compared. But since the commit no longer exist on Github (but GitHub api returns it's sha for comment to specify at what point the comment was left to which code is related) there is nothing to compare to get the diff and to create NoteDiffFile. There are a couple of places where logic depends on commits comprising. And as it is stated here

And last, if this is about GitHub missing the commits... well, nothing much we can do in that case

There is a couple of scenarios of how to proceed with it:

  1. Additionally fetch missing commits (it will work for garbage-collected commits that are still present in GitHub), so the issue will still take a place.
  2. Store diff hunk in st_diff (as it is done for LegacyDiffNote) and use it for diff_file creation when the commit is not present.
  3. Do not import comments if the commit does not exist (but it will be a breaking change since it was imported with LegacyDiffNote before).

So option 2 was chosen. For me, it looks like the existing logic for handling DiffNote creation was designed for the creation of new notes on GitLab and is not well-suitable for importing. The changes are not related to GitHub Import only so can affect existing logic for DiffNote creation.

Below are the issues that were found during implementation:

Callbacks were not skipped when importing? and validation failed due to empty end_line

For some comments import fails with ["Line code can't be blank", "Line code must be a valid line code", "Position is incomplete"] and fall back to LegacyDiffNote creation. end_line and position usage is described in the comment:

the end_line feeds the diff_position method and sets the note.position that is sent to the note create service.

Callbacks were not skipped since DiffNote has missed importing: true attribute on creation. Also, GitHub not always returning a value for end_line, sometimes it returns original_end_line, but we cannot rely on it as well. It turns out, that we cannot rely on it's value even if it is present in the response JSON.

For example, https://api.github.com/repos/rspec/rspec-mocks/pulls/comments/1781980 comment has the wrong end_line value, if try to navigate to the UI part to see this comment https://github.com/rspec/rspec-mocks/pull/183#discussion_r1781980 it is not shown there. With the use of LegasyDiffNote, such notes are shown correctly (for legacy line is parsed from diff_hunk which is always present).

Or this one https://api.github.com/repos/rspec/rspec-mocks/pulls/comments/3402456 has line_code 240 but on UI it is shown under 90.

So to make it work with DiffNote the line will be taken from diff_hunk.

GitHub API returns side RIGHT by default

DiffNote requires position creation. The side was used to build params for position creation and to detect whether it was an addition or deletion. The problem is that for some comments (usually the older once as per my investigation, but I might be wrong) GitHub API returns RIGHT even if it was a deletion.

For example https://api.github.com/repos/rspec/rspec-mocks/pulls/comments/4124549 side RIGHT but it is deletion.

To fix this issue, the side will be also taken from the parsed diff_hunk

The wrong commit was taken for comparing

On position creation, the MR head_sha (that is actually the last commit) was passed instead of original_commit_id (which is a commit to which the comment is referring) from GitHub response to compare. It is applied to the commits that are still can be found in the git history.

GitLab fetches note diff files from the git if it was not found in the MR (if it is not the latest commit)

For example https://api.github.com/repos/rspec/rspec-mocks/pulls/comments/1781955 no original_commit can be found in git, PR doesn’t have this commit in git history. There is no way to detect whether it is outdated and force-pushed based on the GitHub API response.

It doesn't work for outdated and for commits that were erased by force-pushed commits. For the commits that were erased by force pushes there is no way to get it from git.

So to fix this issue the create_diff_file callback will be skipped. Instead in the Notes::CreateService if the diff note file was not found it will be created based on the info from GitHub (diff_hash that contains line_code, diff_hunk etc needed for note diff file creation)

After review it turns out that we cannot fix it just touching github import related files only by skipping callback etc (for more details discussion link)

So instead, if the commit no longer exists and was not imported the diff hunk is saved in the DB and is used for diff file creation instead. It is still not a perfect solution though (see comment here)

Known issues

This is an issue that is applicable for both LegacyDiffNote and DiffNote and doesn't depend on the switching to DiffNote usage GitHub Import: Document limitation that old dis... (#386806 - closed)

Another one NoMethodError: undefined method `new_pos' for n... (#334801 - closed), on the GitHub side it is possible to leave a comment on file (not on a particular line), while GitLab doesn't support such notes. There is an MR for the fix for LegacyDiffNote to not fail the import. Similar fix is applied to DiffNotes, but it just partially fixed the problem (to prevent 500 error during import). For mo details !107940 (comment 1405872662)

Performance impact

Import of rspec/rspec-mocks that has 2233 pulls comments:

Using LegacyDiffNote Screenshot_2023-01-26_at_11.50.31
Using DiffNote Screenshot_2023-01-26_at_11.32.26

Screenshots or screen recordings

When GitHub api returns wrong end_line and it cannot be found on ui, related to Callbacks were not skipped when importing? and validation failed due to empty end_line

GitHub url to comment https://github.com/rspec/rspec-mocks/pull/183#discussion_r1781980 but it cannot be found there due to wrong line code returned by API

With LegacyDiffNote:

Screenshot_2023-01-25_at_14.27.06

With DiffNote:

Screenshot_2023-01-25_at_14.28.40

When force pushed and original commit no longer exists, related to The wrong commit was taken for comparing

GitHub url to comment https://github.com/rspec/rspec-mocks/pull/183#discussion_r1781955

With LegacyDiffNote

Screenshot_2023-01-25_at_14.31.05

With DiffNote

Screenshot_2023-01-25_at_14.30.24

How to set up and validate locally

  1. Enable the FF to import with DiffNote
    Feature.enable(:remove_legacy_diff_note_on_github_import)
  2. Initiate GitHub repository import via API or UI.
  3. Visit recently imported project, check notes in MR, all of them should have resolve button visible.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Kristina Doskich

Merge request reports

Loading