Draft: Use ETAG in UserFinder in GitHub Import
What does this MR do and why?
Update UserFinder class to use ETAG to reduce the number of requests to GitHub API
This change updates the class to store the ETAG response header when the user's public email is not configured to avoid reaching the API rate limit more often. When the user does not have a public email, we fetch the user detail every 15 minutes instead every 24 hours. The use of ETAGs is recommended by GitHub since it does not increase the rate limit count when the resource has not been modified.
Related to: #416308 (closed)
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
How to set up and validate locally
One way to check if the cache works is to analyze all requests made to GitHub during an import. To do that, log the requests to GitHub by adding a log in the Octokig Midleware, like the example below
diff --git a/lib/gitlab/octokit/middleware.rb b/lib/gitlab/octokit/middleware.rb
index f944f9827a32..e59b27ba42a8 100644
--- a/lib/gitlab/octokit/middleware.rb
+++ b/lib/gitlab/octokit/middleware.rb
@@ -8,6 +8,8 @@ def initialize(app)
end
def call(env)
+ Gitlab::Import::Logger.info(message: 'GitHub API request', url: env[:url])
+
Gitlab::UrlBlocker.validate!(env[:url],
schemes: %w[http https],
allow_localhost: allow_local_requests?,
Then import large GitHub project, for example using the command below that imports rspec-core
curl --location 'http://gdk.test:3000/api/v4/import/github' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer GDK_ACCESS_TOKEN' \
--data '{
"personal_access_token": "GITHUB_ACCESS_TOKEN",
"repo_id": "238983",
"target_namespace": "root",
"new_name": "rspec-core"
}'
Then check the log for duplicated requests. For example, using the command below
grep "https://api.github.com/users/" log/importer.log | jq .url | sort | uniq -c | sort -h
Note: a few requests may be duplicated, as multiple workers can request the user details before the cache is saved, however, we shouldn't see a lot of duplicated requests
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.