`internal_ids` for pipelines causes CrossDatabaseModificationAcrossUnsupportedTablesError

Per https://docs.gitlab.com/ee/development/database/multiple_databases.html#removing-cross-database-transactions we do not allow you to write to 2 different databases within the context of a single transaction. When creating pipelines we're opening a transaction and writing to internal_ids as well as other CI tables which violates this rule:

$ bin/rspec ./spec/lib/gitlab/ci/pipeline/seed/build_spec.rb                                                                     
                                                                                                                                                                                              
Failures:                                                                                                                                                                                                
                                                                                                                                                                                                         
  1) Gitlab::Ci::Pipeline::Seed::Build#to_resource can not be persisted without explicit assignment                                                                                                      
     Failure/Error: raise CrossDatabaseModificationAcrossUnsupportedTablesError, message                                                                                                                 
                           Gitlab::Database::QueryAnalyzers::PreventCrossDatabaseModification::CrossDatabaseModificationAcrossUnsupportedTablesError:                                                                          
       Cross-database data modification of 'gitlab_main, gitlab_ci' were detected within a transaction modifying the 'internal_ids, ci_pipelines' tables.Please refer to https://docs.gitlab.com/ee/development/database/multiple_databases.html#removing-cross-database-transactions for details on how to resolve this exception.

For now we have added this violation to the allowlist https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/support/database/cross-database-modification-allowlist.yml .

This internal_ids is actually the cause for quite a few of the violations in the allowlist.

Possible solutions

See https://docs.gitlab.com/ee/development/database/multiple_databases.html#removing-cross-database-transactions .

Possibly we need to consider this internal_ids table shared. Below is some more detail and there are some discussion threads on this issue already.

More context

The internal_ids table is used by many models to generate scoped, auto-incremented ids. The ci_pipelines uses the internal_ids table to populate the ci_pipelines.iid column.

Idea: Keep internal_ids on both the main and the CI database servers

When migrating to the new database server, all tables will be replicated via streaming replication, so the internal_ids table will be available on the CI database. After the cutover, the table can stay there as is. With a data migration, we can clean up the unwanted records that are not associated with ci_pipelines. Models that are using internal_ids can be found here: Enums::InternalId.usage_resources.

The table:

                                                   Table "public.internal_ids"
    Column    |  Type   | Collation | Nullable |                 Default                  | Storage | Stats target | Description
--------------+---------+-----------+----------+------------------------------------------+---------+--------------+-------------
 id           | bigint  |           | not null | nextval('internal_ids_id_seq'::regclass) | plain   |              |
 project_id   | integer |           |          |                                          | plain   |              |
 usage        | integer |           | not null |                                          | plain   |              |
 last_value   | integer |           | not null |                                          | plain   |              |
 namespace_id | integer |           |          |                                          | plain   |              |
Indexes:
    "internal_ids_pkey" PRIMARY KEY, btree (id)
    "index_internal_ids_on_usage_and_namespace_id" UNIQUE, btree (usage, namespace_id) WHERE namespace_id IS NOT NULL
    "index_internal_ids_on_usage_and_project_id" UNIQUE, btree (usage, project_id) WHERE project_id IS NOT NULL
    "index_internal_ids_on_namespace_id" btree (namespace_id)
    "index_internal_ids_on_project_id" btree (project_id)
Foreign-key constraints:
    "fk_162941d509" FOREIGN KEY (namespace_id) REFERENCES namespaces(id) ON DELETE CASCADE
    "fk_rails_f7d46b66c6" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
Access method: heap

Concern: the CI database will not know about the projects and namespaces tables, so the foreign keys need to be dropped. We'll need a way to keep this table consistent, for example cleaning up items with non-existing project_id or namespace_id periodically.

Edited Nov 11, 2021 by Dylan Griffith