Draft: POC Enforce Organization Isolation based on `organization_id` on every table (!129889) · Merge requests · GitLab.org / GitLab

Dylan Griffith requested to merge add-organization-id-and-index-to-everything-and-constraints into master Aug 22, 2023

What does this MR do and why?

Related to #394800 (closed) . This is a POC to evaluate whether we can enforce organization isolation using something like foreign keys. The idea is that every table has an organization_id and every existing foreign key is converted to also having organization_id as part of a composite foreign key. So this would basically mean both sides of every foreign key must have the same organization_id.

For example:

ALTER TABLE ONLY issue_links
    ADD CONSTRAINT org_fk_c900194ff2 FOREIGN KEY (organization_id, source_id) REFERENCES issues(organization_id, id) ON UPDATE CASCADE ON DELETE CASCADE;

ALTER TABLE ONLY issue_links
    ADD CONSTRAINT org_fk_e71bb44f1f FOREIGN KEY (organization_id, target_id) REFERENCES issues(organization_id, id) ON UPDATE CASCADE ON DELETE CASCADE;

This would ensure that issue links only exist between issues belonging to the same organization.

After deeper investigation we realised we don't need full foreign keys because they have 2 additional costs:

They require a unique index on the referenced columns. It's expensive to build hundreds of new indexes and all of them would already be unique based on the fact that they are composite of another foreign key so it's just wasteful.
We don't need to validate all existing rows. All existing rows will have organization_id=1 so it's wasteful having Postgres validating hundreds of foreign keys that are definitely valid

So we realised that we can implement the equivalent functionality we need with a Postgres trigger (foreign keys are implemented behind the scenes as a CONSTRAINT TRIGGER anyway).

TODO

Add organization_id column to every table
Get trigger approach working
Implement a service that can (given a single organization_id) find all violations of cross-organization data. This could use the foreign keys we used to create the triggers but it could additionally use loose foreign keys.
1. Run periodic worker that iterates over all organization_id (except for 1 because it will be caught by the other ids) and logs violations
2. This service data could also be fetched and displayed on some organization page so that they will be able to know what data may not be working correctly
3. Could this service also be repurposed for "proposed" organization moves? Could be something like using the group transfer service and then detecting all the violations. Could we do it in a transaction or something?
Test out implementing the parent side of the foreign key constraint. This would fail to UPDATE the organization_id in the case of their being existing references to the current primary key. We probably don't need "ON DELETE CASCADE" because that would already be covered by the original non-composite FK. We just need to block updating the organization_id unless the references are already updated. How do we solve the chicken and egg problem in a single transaction?
Is it possible that this kind of tooling is only necessary for foreign keys that could theoretically span multiple organizations? If specific foreign keys were known to imply data that belongs to the same namespace/project then we would also be confident that they must be in the same organization? So then maybe there is just a class of foreign keys like issue_issue_links.source_id/target_id and merge_requests.source_project_id/target_project_id that we need to be concerned with? Maybe the generalized solution is only tables that contain multiple foreign keys need to be considered. Anything that has only a single foreign key is considered to be hierachical and must "belong to" the parent record and presumably couldn't logically have a parent in another organization.

Statistics

	Storage Added	How it was calculated?
All new columns	`313 GiB`	8 * reltuples of all tables as it adds 8 bytes for every tuple in the DB
All new indexes	`1.8 TiB`	2 * 3 * above value. This is because it will create a new index of 2 8 byte columns and the value of 3 was just an observation that a few of our indexes are 2-3 times the size of the columns they store. Index size can be from bloat, the actual values and the extra tuple data needed to maintain the index
Total added	`2.1 TiB = 11%`

Challenges

Possible alternative all read queries have `WHERE organization_id`

Downside is all instance wide things like periodic sidekiq workers will have to change
We also might lose all index only scans because none of the indexes would have organization_id in them

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before	After

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited Sep 07, 2023 by Dylan Griffith

Draft: POC Enforce Organization Isolation based on `organization_id` on every table