Remove LegacyDiffNote usage for Github import
What does this MR do and why?
❗ Update
The new issue was created to remove LegacyDiffNote usage GitHub Import: Switch from LegacyDiffNote to Di... (#411918), the original issue will be handled separately by UI changes.
⤵
Details below For the testing purpose, I was using rspec/rspec-core
and rspec/rspec-mocks
repos.
GitHub Importer already supports note creation with DiffNote
, but only notes with suggestions. It was implemented in the scope of GithubImport: Enable users to import modern diff notes
During implementation couple of issues appeared after switching to DiffNote
(with an existing implementation for notes with suggestions)
Problems are described in the comment here, the next items are about how it was fixed and why:
Summary
The main issue is with importing Comments that are referencing commits that no longer exist (were overridden by force-pushes).
To show diff on UI for DiffNote
two commits (head and base) are compared. But since the commit no longer exist on Github (but GitHub api returns it's sha for comment to specify at what point the comment was left to which code is related) there is nothing to compare to get the diff and to create NoteDiffFile
. There are a couple of places where logic depends on commits comprising. And as it is stated here
And last, if this is about GitHub missing the commits... well, nothing much we can do in that case
There is a couple of scenarios of how to proceed with it:
- Additionally fetch missing commits (it will work for garbage-collected commits that are still present in GitHub), so the issue will still take a place.
- Store diff hunk in
st_diff
(as it is done forLegacyDiffNote
) and use it for diff_file creation when the commit is not present. - Do not import comments if the commit does not exist (but it will be a breaking change since it was imported with
LegacyDiffNote
before).
So option 2 was chosen. For me, it looks like the existing logic for handling DiffNote creation was designed for the creation of new notes on GitLab and is not well-suitable for importing. The changes are not related to GitHub Import only so can affect existing logic for DiffNote creation.
Below are the issues that were found during implementation:
Callbacks were not skipped when importing? and validation failed due to empty end_line
For some comments import fails with ["Line code can't be blank", "Line code must be a valid line code", "Position is incomplete"]
and fall back to LegacyDiffNote creation. end_line
and position
usage is described in the comment:
the
end_line
feeds the diff_position method and sets the note.position that is sent to the note create service.
Callbacks were not skipped since DiffNote
has missed importing: true
attribute on creation.
Also, GitHub not always returning a value for end_line
, sometimes it returns original_end_line
, but we cannot rely on it as well. It turns out, that we cannot rely on it's value even if it is present in the response JSON.
For example, https://api.github.com/repos/rspec/rspec-mocks/pulls/comments/1781980 comment has the wrong end_line
value, if try to navigate to the UI part to see this comment https://github.com/rspec/rspec-mocks/pull/183#discussion_r1781980 it is not shown there. With the use of LegasyDiffNote
, such notes are shown correctly (for legacy line is parsed from diff_hunk
which is always present).
Or this one https://api.github.com/repos/rspec/rspec-mocks/pulls/comments/3402456 has line_code
240 but on UI it is shown under 90.
So to make it work with DiffNote
the line will be taken from diff_hunk
.
GitHub API returns side RIGHT by default
DiffNote requires position creation. The side
was used to build params for position creation and to detect whether it was an addition or deletion. The problem is that for some comments (usually the older once as per my investigation, but I might be wrong) GitHub API returns RIGHT
even if it was a deletion.
For example https://api.github.com/repos/rspec/rspec-mocks/pulls/comments/4124549 side RIGHT but it is deletion.
To fix this issue, the side will be also taken from the parsed diff_hunk
The wrong commit was taken for comparing
On position creation, the MR head_sha
(that is actually the last commit) was passed instead of original_commit_id
(which is a commit to which the comment is referring) from GitHub response to compare. It is applied to the commits that are still can be found in the git history.
GitLab fetches note diff files from the git if it was not found in the MR (if it is not the latest commit)
For example https://api.github.com/repos/rspec/rspec-mocks/pulls/comments/1781955 no original_commit can be found in git, PR doesn’t have this commit in git history. There is no way to detect whether it is outdated and force-pushed based on the GitHub API response.
It doesn't work for outdated and for commits that were erased by force-pushed commits. For the commits that were erased by force pushes there is no way to get it from git.
So to fix this issue the create_diff_file callback will be skipped. Instead in the Notes::CreateService
if the diff note file was not found it will be created based on the info from GitHub (diff_hash
that contains line_code, diff_hunk etc needed for note diff file creation)
After review it turns out that we cannot fix it just touching github import related files only by skipping callback etc (for more details discussion link)
So instead, if the commit no longer exists and was not imported the diff hunk is saved in the DB and is used for diff file creation instead. It is still not a perfect solution though (see comment here)
Known issues
This is an issue that is applicable for both LegacyDiffNote
and DiffNote
and doesn't depend on the switching to DiffNote
usage GitHub Import: Document limitation that old dis... (#386806 - closed)
Another one NoMethodError: undefined method `new_pos' for n... (#334801 - closed), on the GitHub side it is possible to leave a comment on file (not on a particular line), while GitLab doesn't support such notes. There is an MR for the fix for LegacyDiffNote to not fail the import. Similar fix is applied to DiffNotes, but it just partially fixed the problem (to prevent 500 error during import). For mo details !107940 (comment 1405872662)
Performance impact
Import of rspec/rspec-mocks
that has 2233 pulls comments:
Screenshots or screen recordings
When GitHub api returns wrong end_line and it cannot be found on ui, related to Callbacks were not skipped when importing? and validation failed due to empty end_line
GitHub url to comment https://github.com/rspec/rspec-mocks/pull/183#discussion_r1781980 but it cannot be found there due to wrong line code returned by API
With LegacyDiffNote:
With DiffNote:
When force pushed and original commit no longer exists, related to The wrong commit was taken for comparing
GitHub url to comment https://github.com/rspec/rspec-mocks/pull/183#discussion_r1781955
With LegacyDiffNote
With DiffNote
How to set up and validate locally
- Enable the FF to import with DiffNote
Feature.enable(:remove_legacy_diff_note_on_github_import)
- Initiate GitHub repository import via API or UI.
- Visit recently imported project, check notes in MR, all of them should have resolve button visible.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.