Adjust Bitbucket Cloud PR importer to be resumable
-
Please check this box if this contribution uses AI-generated content (including content generated by GitLab Duo features) as outlined in the GitLab DCO & CLA
What does this MR do and why?
Adjust Bitbucket Cloud importer to be resumable. Upon interruption of "stage workers", ideally, we should only resume from the last page. Usually, we record the last "page number" then start from that page. However as per Bitbucket Cloud documentation
However, clients are not expected to construct URLs themselves by manipulating the page number query parameter. Instead, the response contains a link to the next page. This link should be treated as an opaque location that is not to be constructed by clients or even assumed to be predictable. The only contract around the next link is that it will return the next chunk of results.
It is important to realize that Bitbucket support both list-based pagination and iterator-based pagination. List-based pagination assumes that the collection is a discrete, immutable, consistently ordered, finite array of objects with a fixed size. Clients navigate a list-based collection by requesting offset-based chunks. In Bitbucket Cloud, list-based responses include the optional size, page, and previous element. The the next and previous links typically resemble something like /foo/bar?page=4.
Our pagination supposed to depend on "next URL" instead of the traditional "page number"
Changelog: performance
Technical Decisions
- This MR only handles Bitbucket PR importer, other importers will be worked on different MR
-
ParallelScheduling
does not definedef execute
yet as it might cause too many changes. Notice that many "importer workers" includedParallelScheduling
where it shouldn't needed. -
ParallelScheduling
behaves slightly differently from GitHub. It defines a string forrepresentation_type
instead of class. The reason is to minimize changes, as this MR reuses the currentBitbucket::Page
which "automatically" converts theitems
into representation object
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
Successful imported MR: | |
Interruption happened after MR title dummy 20 : |
|
Upon resume, retrying MR title dummy 20 . Then continuing to MR title dummy 21 : |
|
How to set up and validate locally
- Setup Bitbucket Cloud following this guide
- In rails console enable the feature flag
Feature.enable(:bitbucket_import_resumable_worker)
- Patch the code to add interruption:
diff --git a/lib/gitlab/bitbucket_import/parallel_scheduling.rb b/lib/gitlab/bitbucket_import/parallel_scheduling.rb
index 8f03bf1db1cd..927a7c62e034 100644
--- a/lib/gitlab/bitbucket_import/parallel_scheduling.rb
+++ b/lib/gitlab/bitbucket_import/parallel_scheduling.rb
@@ -42,6 +42,14 @@ def each_object_to_import
options = collection_options.merge(representation_type: representation_type, next_url: page_keyset.current)
client.each_page(collection_method, repo, options) do |page|
+ log_info(message: page.inspect)
+ Gitlab::Redis::SharedState.with do |redis|
+ temp_key = 'test-bitbucket-pr'
+ temp_counter = redis.incr(temp_key)
+ redis.expire(temp_key, 5.minutes)
+ raise "purposely interrupt" if temp_counter == 2
+ end
+
page.items.each do |object|
job_waiter.jobs_remaining = Gitlab::Cache::Import::Caching.increment(job_waiter_remaining_cache_key)
- Visit Bitbucket Cloud import page http://127.0.0.1:3000/import/bitbucket/status
- Click the
Import
button. - Tail the log file:
tail -f log/importer.log
Related to #466231 (closed)