GithubImporter: Optimize Pull Request Review Importer [RUN ALL RSPEC] [RUN AS-IF-FOSS]
What does this MR do?
Problem
The Github API does not provide a way to fetch all the pull requests reviews of a project (repo), like it provides for comments, instead we have to fetch the reviews by Pull Request.
For this reason, the Gitlab::GithubImport::Importer::PullRequestsReviewsImporter
¹ have to iterate over the imported pull requests and for each one do request the reviews, which might be more than one page.
If the importer hits a rate limit, the process restarts, and the imported pull requests are skipped², but the importer goes over all the review pages again.
In other words, for some projects with large number of pull requests and large number of reviews per pull request, we might end up with duplicated reviews and unnecessary API requests, which would lead to longer importing times.
Proposed solution
- To avoid duplicated comments, besides caching the Pull Requests ids, also cache the review ids and skip the already processed ones.
- To avoid unnecessary API requests, use the
PageCounter
to only request pages that weren't yet imported.
Reference
- !48632 (merged) - First version of the Pull Requests reviews importer
- !60668 (merged) - Skip already imported Pull Requests reviews
- Related to: #330783 (closed)
Screenshots (strongly suggested)
Does this MR meet the acceptance criteria?
Conformity
-
I have included a changelog entry, or it's not needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team