Come up with an alternative to Postgres cascading deletes/nullify => Loose foreign keys
Related to #336417 (closed)
Problem statement
When eliminating foreign keys between the Main DB and CI DB tables, we'll need to maintain referential integrity on the application layer. Ensuring strong consistency is not optimal due to performance reasons. (2 phase commit)
Example: cleanup associated records with ActiveRecord
Note: we do this in the PoC MR.
class Project < ApplicationRecord
has_many :builds, class_name: 'Ci::Build', inverse_of: :project, dependent: :delete_all
end
# ...
project.destroy
The following example will not work as expected with two databases since the actual build record deletion happens outside of the scope of the transaction on the Main DB (where the projects
table is located). The cross_database_modification_check
utility helps to locate these cases.
- Transaction begins on the Main database.
- Delete the project record.
- Delete the associated
ci_builds
rows. - Transaction committed on the Main database.
The deletion in the ci_builds
table is actually not part of the transaction. If a rollback is initiated after the ci_builds
records are deleted, those rows will not be restored.
Wrapping the operation with an extra transaction on the CI DB would address the problem however, it may cause long-running transactions which could affect the availability of GitLab.
Current plan
Asynchronous data cleanup using Postgres triggers and a queuing table in Postgres that is picked up by a sidekiq cron worker.
You can see more details in the thread #338535 (comment 655802260) and other comments below that.
The current Draft MR is at !69165 (merged) .
Overall we are referring to this concept as "Loose foreign keys".
Previous Ideas: eventual consistency
Ideas
Option 1: Extend ActiveRecord
By extending ActiveRecord's dependent
option, we could remove the associated records in an after commit callback:
class Project < ApplicationRecord
has_many :builds, class_name: 'Ci::Build', inverse_of: :project, dependent: :delete_all_after_commit
end
In this case, the deletion happens after the main transaction is committed. At the point where the project row is already deleted.
Option 2: Extra module
The change above requires monkey-patching ActiveRecord
(contribute upstream later). We can achieve the same thing via a simple Module
:
class Project < ApplicationRecord
include EventuallyConsistentAssociation
has_many :builds, class_name: 'Ci::Build', inverse_of: :project
# Requires an association named `builds` to be present
cleanup_after_commit :builds, with: :delete # or :nullify
end
Both approaches would mitigate the "association removal inside the transaction" problem however, there is a chance where the cleanup would not finish, for example, the application crashes while deleting the rows. Keeping unused and inaccessible rows in the database costs us disk space and it can cause inconsistency issues later.
To address this problem, a periodical cleanup job is needed to detect orphaned records and delete them efficiently (batching). By using the dependent: :delete_all_after_commit
or cleanup_after_commit
methods, we can easily detect all ActiveRecord models that needs cleanup and we can build a generic cleanup runner.
# pseudo code
Project.each_batch do |projects|
min, max, count = projects.pluck('MIN(id), MAX(id), COUNT(id)').first
# Note: distinct will be slow, we'll need to use loose-index scan
build_project_id_count = Ci::Build.distinct(:project_id).where(id: min..max).count
if count != build_project_id_count
# cleanup is needed
end
end
Running such a job can be time-consuming so we need to ensure that we scan the minimum amount of rows. In case we want to be eventually consistent between projects
and ci_builds
we need about 2N index lookups, where the N is the number of projects in the GitLab instance.
Making the cleanup runner more efficient
Let's track the deleted projects and other referenced models from the main DB in a separated table: deleted_$table_name
. We create a table for each Model where we have FKs crossing the DB boundary.
Example: deleted_projects
When a project is deleted, during the transaction which deletes the Project record we insert a new row to deleted_projects
:
project_id |
---|
2 |
6 |
7 |
After the transaction commits, we trigger a background job. Or a scheduled job will pick up the deleted project id or ids.
For each id, the job will perform DELETE
statements in the CI DB. When all queries are finished, we can safely clean up the record from deleted_projects
. This approach eliminates the periodical table scanning. We might want to run it through from time to time to make sure we cleaned up everything.