Skip to content

Fix object deduplication on Geo first sync

Catalin Irimie requested to merge cat-geo-obj-deduplication-first-sync into master

What does this MR do and why?

For forks of large projects, if the project does not exist when the first sync happens, or the sync fails for any reason, we did not trigger object deduplication. This ensures the @new_repository is set even if fetching fails, but a repository is created (through ensure_repository).

Related to #343245 (closed).

How to set up and validate locally

  1. Have a GDK and GDK with Geo setup

  2. Add a time delay for forking on the primary, for example:

    diff --git a/app/workers/repository_fork_worker.rb b/app/workers/repository_fork_worker.rb
    index 5ec9ceaf004..5fdefb5d52e 100644
    --- a/app/workers/repository_fork_worker.rb
    +++ b/app/workers/repository_fork_worker.rb
    @@ -12,6 +12,7 @@ class RepositoryForkWorker # rubocop:disable Scalability/IdempotentWorker
                        feature_category :source_code_management
    
                        def perform(*args)
                          +    sleep(180)
                          target_project_id = args.shift
                          target_project = Project.find(target_project_id)
    
  3. Check out this branch on the secondary, gdk restart

  4. Create a new public (or internal, but not private, as deduplication doesn't happen for private repos) repository, observe it sync successfully

  5. Fork this project into a new public repository, observe the objects/info/alternates file gets created on disk at the path for this repo on the secondary (admin area, or Project.last.disk_path)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Catalin Irimie

Merge request reports

Loading