Set sharding keys for feature category `importers` tables
About
As part of Cells preparation, all tables need to have a "sharding key" defined.
This issue was created from !152751 (merged) where we set the sharding_key_issue_url
for some feature category importers
tables to point to this issue, as a temporary step to allow us to schedule the work into a milestone.
If we have questions or concerns, we can reach out to #g_tenant-scale
.
The below description was copied from !152751 (merged).
Task
Sharding keys need to be set for the tables:
-
bulk_imports
- Sharding keys: TBD
- Note: It's currently classified as
gitlab_main_clusterwide
but we should change it togitlab_main_cell
and determine what sharding keys to use. There's a separate issue to work on this: #499829
-
bulk_import_exports
- Sharding keys:
group_id
/project_id
- MR: !168411 (merged)
- Directly linked to
group_id
/project_id
already
- Sharding keys:
-
bulk_import_batch_trackers
- Sharding keys:
namespace_id
/project_id
- MR: !168390
- Relation to sharding keys:
bulk_import_trackers
=>bulk_import_entities
=>namespace_id
/project_id
- Sharding keys:
-
bulk_import_configurations
- Note: No sharding since it's using
gitlab_main_clusterwide
- MR: !168386
- Note: No sharding since it's using
-
bulk_import_entities
- Note: Still under discussion. Likely
project_id
andnamespace_id
but records are created before these values get set, which complicates things
- Note: Still under discussion. Likely
-
bulk_import_export_batches
- Sharding keys:
group_id
/project_id
- MR: !168387
- Relation to sharding keys:
bulk_import_exports
=>project_id
/group_id
- Sharding keys:
-
bulk_import_export_uploads
- Sharding keys:
group_id
/project_id
- MR: !168388
- Relation to sharding keys:
bulk_import_exports
=>project_id
/group_id
- Sharding keys:
-
bulk_import_failures
- Sharding keys:
namespace_id
/project_id
- MR: !168389
- Relation to sharding keys:
bulk_import_entities
=>namespace_id
/project_id
- Sharding keys:
-
bulk_import_trackers
- Sharding keys:
namespace_id
/project_id
- MR: !168385
- Relation to sharding keys:
bulk_import_entities
=>namespace_id
/project_id
- Sharding keys:
-
import_export_uploads
- Sharding keys:
group_id
/project_id
(Already exists in the table) - MR: !168392 (merged))
- Sharding keys:
-
import_failures
- Sharding keys:
group_id
/project_id
(Already exist in the table) - Note: Some rows have neither set, and are linked to the user instead so we need a fallback option
- MR: !168393
- Sharding keys:
-
project_import_data
- Sharding key
project_id
(Already exists in the table) - MR: !168391 (merged)
- Sharding key
This involves choosing one of the following, based on the intended behaviour of the table:
-
The table is not cell-local
- Set
gitlab_schema
togitlab_main_clusterwide
.
- Set
-
The table is cell-local and requires a sharding key
- Set
gitlab_schema
togitlab_main_cell
- Add a
sharding_key
ordesired_sharding_key
configuration. If the configuration is known but the chosen key doesn't yet meet not-null and foreign key requirements, you can add an exception toallowed_to_be_missing_not_null
orallowed_to_be_missing_foreign_key
to get the pipeline passing. Please link to a follow-up issue in a code comment next to the exception. - You may also need to set
allow_cross_joins
,allow_cross_transactions
andallow_cross_foreign_keys
if changing the schema causes pipeline failures. Seedb/docs/epics.yml
for an example.
- Set
-
The table is cell-local and does not require a sharding key
- Set
gitlab_schema
togitlab_main_cell
and - Set
exempt_from_sharding
totrue
.
- Set
Documentation
Edited by Keeyan Nejad