Adjust Bitbucket Cloud PR importer to be resumable (!156797) · Merge requests · GitLab.org / GitLab

Ivan Sebastian requested to merge ivantedja/gitlab:466231-resumable-bitbucket-import into master Jun 19, 2024

Please check this box if this contribution uses AI-generated content (including content generated by GitLab Duo features) as outlined in the GitLab DCO & CLA

What does this MR do and why?

Adjust Bitbucket Cloud importer to be resumable. Upon interruption of "stage workers", ideally, we should only resume from the last page. Usually, we record the last "page number" then start from that page. However as per Bitbucket Cloud documentation

However, clients are not expected to construct URLs themselves by manipulating the page number query parameter. Instead, the response contains a link to the next page. This link should be treated as an opaque location that is not to be constructed by clients or even assumed to be predictable. The only contract around the next link is that it will return the next chunk of results.

It is important to realize that Bitbucket support both list-based pagination and iterator-based pagination. List-based pagination assumes that the collection is a discrete, immutable, consistently ordered, finite array of objects with a fixed size. Clients navigate a list-based collection by requesting offset-based chunks. In Bitbucket Cloud, list-based responses include the optional size, page, and previous element. The the next and previous links typically resemble something like /foo/bar?page=4.

Our pagination supposed to depend on "next URL" instead of the traditional "page number"

Changelog: performance

Technical Decisions

This MR only handles Bitbucket PR importer, other importers will be worked on different MR
ParallelScheduling does not define def execute yet as it might cause too many changes. Notice that many "importer workers" included ParallelScheduling where it shouldn't needed.
ParallelScheduling behaves slightly differently from GitHub. It defines a string for representation_type instead of class. The reason is to minimize changes, as this MR reuses the current Bitbucket::Page which "automatically" converts the items into representation object

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before	After
	Successful imported MR:

	Interruption happened after MR title `dummy 20`:

	Upon resume, retrying MR title `dummy 20`. Then continuing to MR title `dummy 21`:

How to set up and validate locally

Setup Bitbucket Cloud following this guide

In rails console enable the feature flag

Feature.enable(:bitbucket_import_resumable_worker)

Patch the code to add interruption:

diff --git a/lib/gitlab/bitbucket_import/parallel_scheduling.rb b/lib/gitlab/bitbucket_import/parallel_scheduling.rb
index 8f03bf1db1cd..927a7c62e034 100644
--- a/lib/gitlab/bitbucket_import/parallel_scheduling.rb
+++ b/lib/gitlab/bitbucket_import/parallel_scheduling.rb
@@ -42,6 +42,14 @@ def each_object_to_import
         options = collection_options.merge(representation_type: representation_type, next_url: page_keyset.current)
 
         client.each_page(collection_method, repo, options) do |page|
+          log_info(message: page.inspect)
+          Gitlab::Redis::SharedState.with do |redis|
+            temp_key = 'test-bitbucket-pr'
+            temp_counter = redis.incr(temp_key)
+            redis.expire(temp_key, 5.minutes)
+            raise "purposely interrupt" if temp_counter == 2
+          end
+
           page.items.each do |object|
             job_waiter.jobs_remaining = Gitlab::Cache::Import::Caching.increment(job_waiter_remaining_cache_key)

Visit Bitbucket Cloud import page http://127.0.0.1:3000/import/bitbucket/status
Click the Import button.
Tail the log file: tail -f log/importer.log

Related to #466231 (closed)

Edited Jun 25, 2024 by Rodrigo Tomonari

Adjust Bitbucket Cloud PR importer to be resumable

What does this MR do and why?

Technical Decisions

MR acceptance checklist

Screenshots or screen recordings

How to set up and validate locally

Merge request reports