`internal_ids` for pipelines causes CrossDatabaseModificationAcrossUnsupportedTablesError
Per https://docs.gitlab.com/ee/development/database/multiple_databases.html#removing-cross-database-transactions we do not allow you to write to 2 different databases within the context of a single transaction. When creating pipelines we're opening a transaction and writing to internal_ids
as well as other CI tables which violates this rule:
$ bin/rspec ./spec/lib/gitlab/ci/pipeline/seed/build_spec.rb
Failures:
1) Gitlab::Ci::Pipeline::Seed::Build#to_resource can not be persisted without explicit assignment
Failure/Error: raise CrossDatabaseModificationAcrossUnsupportedTablesError, message
Gitlab::Database::QueryAnalyzers::PreventCrossDatabaseModification::CrossDatabaseModificationAcrossUnsupportedTablesError:
Cross-database data modification of 'gitlab_main, gitlab_ci' were detected within a transaction modifying the 'internal_ids, ci_pipelines' tables.Please refer to https://docs.gitlab.com/ee/development/database/multiple_databases.html#removing-cross-database-transactions for details on how to resolve this exception.
For now we have added this violation to the allowlist https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/support/database/cross-database-modification-allowlist.yml .
This internal_ids
is actually the cause for quite a few of the violations in the allowlist.
Possible solutions
Possibly we need to consider this internal_ids
table shared. Below is some more detail and there are some discussion threads on this issue already.
More context
The internal_ids
table is used by many models to generate scoped, auto-incremented ids. The ci_pipelines
uses the internal_ids
table to populate the ci_pipelines.iid
column.
Idea: Keep internal_ids
on both the main and the CI database servers
When migrating to the new database server, all tables will be replicated via streaming replication, so the internal_ids
table will be available on the CI database. After the cutover, the table can stay there as is. With a data migration, we can clean up the unwanted records that are not associated with ci_pipelines
. Models that are using internal_ids
can be found here: Enums::InternalId.usage_resources
.
The table:
Table "public.internal_ids"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------+---------+-----------+----------+------------------------------------------+---------+--------------+-------------
id | bigint | | not null | nextval('internal_ids_id_seq'::regclass) | plain | |
project_id | integer | | | | plain | |
usage | integer | | not null | | plain | |
last_value | integer | | not null | | plain | |
namespace_id | integer | | | | plain | |
Indexes:
"internal_ids_pkey" PRIMARY KEY, btree (id)
"index_internal_ids_on_usage_and_namespace_id" UNIQUE, btree (usage, namespace_id) WHERE namespace_id IS NOT NULL
"index_internal_ids_on_usage_and_project_id" UNIQUE, btree (usage, project_id) WHERE project_id IS NOT NULL
"index_internal_ids_on_namespace_id" btree (namespace_id)
"index_internal_ids_on_project_id" btree (project_id)
Foreign-key constraints:
"fk_162941d509" FOREIGN KEY (namespace_id) REFERENCES namespaces(id) ON DELETE CASCADE
"fk_rails_f7d46b66c6" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
Access method: heap
Concern: the CI database will not know about the projects
and namespaces
tables, so the foreign keys need to be dropped. We'll need a way to keep this table consistent, for example cleaning up items with non-existing project_id
or namespace_id
periodically.