Creating New Nested Repositories During Import Creates Parent Repositories Marked as "native"
Context
During import, we will create new repositories on the database into which the filesystem metadata can be imported. If the repository is nested, as most are, we will also create each parent repository before the creation of the new repository.
Problem
If a child repository is migrated before a parent repository which also contains images, the parent repository will be created in the database and marked native. The parent repository would later appear as already migrated to the import handler logic since it will be marked native. Additionally, the phase 2 routing logic would begin routing requests to the parent repository through the database, although it's still only present on the filesystem, causing a split brain.
Solutions
Manual Parent Repository Creation
Within the import handler logic, we could avoid using CreateOrFindByPath
, and use CreateOrFind
to create each repository in order, setting the status appropriately if that parent repository does not exist.
Pros
- should prevent race conditions with the phase 2 routing logic
- This change would be accomplished by simply composing existing methods within the import handler logic
Cons
- the handlers rely completely on integration tests, which are less well suited to validating more subtle changes in behavior than unit tests
- we are creating logic already implemented in the
RepositoryStore
to find and create the parent repositories - if these parent repositories are not already present on the old storage prefix, we must always update their status to
native
once they receive a write request - further complicates phase 2 routing logic
Top Down Import
The import workers should only queue repositories whose parents have been imported.
Pros
- the current default logic for top level repositories works appropriately for this scenario.
Cons
- requires an external entity to correctly manage internal registry states
Distinct Migration Status Value for Automatically Created Parent Repositories
Instead of defaulting to native, repositories which are created when incidentally creating a child repository will get a distinct migration status, indicating that they have not been explicitly created.
Pros
- small change
- should prevent race conditions with the phase 2 routing logic
- easier to achieve a high degree of confidence with tests
- handler logic needs to know far less about repository creation
Cons
- further complicates phase 2 routing logic
- if these parent repositories are not already present on the old storage prefix, we must always update their status to
native
once they receive a write request, so that the phase 2 routing logic and import handle can handle these repositories efficiently and correctly
Stop recording intermediate repositories and delete existing empty ones
Status
We went with the Stop recording intermediate repositories and delete existing empty ones
approach. All steps were completed except the cleanup. This will be done in #625.