Add sharding key for packages_dependencies table
Description
Here were discussed several approaches of how to deal with the packages_dependencies
table when organization is moved. It was concluded that adding project_id column, that will be used as a sharding key, to the packages_dependencies
table at the cost of duplication is the option that we should consider.
Currently, the rows in the packages_dependencies
table are shared between the packages. That needs to be changed that the dependencies are scoped to the project and shared only within one project.
Additionally the sharding key need to be set for the packages_dependencies
table
Update https://gitlab.com/gitlab-org/gitlab/-/blob/master/db/docs/packages_dependencies.yml to have:
allow_cross_foreign_keys:
- gitlab_main_clusterwide
sharding_key:
project_id: projects
Important All sharding keys must be not nullable or have a NOT NULL check constraint.
Approximate implementation plan
MR 1
- Add the
project_id
column to thepackages_dependencies
table. - Change the uniqueness validation for the
Packages::Dependency
model and the unique database indexes.- Change existing validation to perform only if
project_id
isn't present. - Add a new uniqueness validation for
name
, scoped to%i[version_pattern project_id]
ifproject_id
is present. - Delete the existing unique index
index_packages_dependencies_on_name_and_version_pattern
. - Add the new unique index on
name, version_pattern WHERE project_id IS NULL
. - Add the new unique index
name, version_pattern, project_id WHERE project_id IS NOT NULL
.
- Change existing validation to perform only if
- Change the
Packages::CreateDependencyService
- Set
project_id
along with other attributes when creating a new dependency.
- Set
MR 2
-
Add background migration to backfill the
project_id
for existing rows and to create new entries inpackages_dependencies
table for uniqname, version_pattern, project_id
. Here is the draft of the migration, that most likely should be optimized for doing bulk operations:Packages::DependencyLink.each_batch do |batch| batch.find_each do |dependency_link| dependency = dependency_link.dependency project_id = dependency_link.project_id if dependency.project_id new_dependency = Packages::Dependency.create(project_id: project_id, name: dependency.name, version_pattern: dependency.version_pattern) dependency_link.update!(dependency_id: new_dependency.id) else dependency.update!(project_id: project_id) end end end
MR 3
- Change the
Packages::CreateDependencyService
- Adjust the logic to re-use dependencies of the same project.
- Change the
project_id
column of theprojects_dependencies
to beNOT NULL
docs. - Remove old
UNIQUE
indexname, version_pattern WHERE project_id IS NULL
. - Update
packages_dependencies.yml
to referenceproject_id
as a sharding key.