Backend: Implement background syncing to ensure catalog_resources is synced with projects
Summary
In #427928 (closed), we added denormalized columns name
and description
to catalog_resources
in order to improve search query performance. In #429056 (closed), we will also denormalize visibility_level
.
The problem with relying on model callbacks to keep the columns in sync is that it may miss bulk updates or direct database updates that don't use ActiveModel. Since visibility_level
is a critical column to keep in sync (as it's related to user permissions), we should implement a robust syncing process that involves a queue table, database triggers, and workers.
Further context:
Per !134708 (comment 1617360980) and #429056 (comment 1621366108):
I would suggest triggers be used for queueing updates to some queue table (similar to how
loose_foreign_keys_deleted_records
andnamespaces_sync_events
works). This can help with scaling characteristics as we can throttle the updates and do them in batches. But it also will benefit us in the long term for Category:Cell architecture because very likely this team will want to build this catalog as a global resource for searching that might need to be in a different database toprojects
. If they are in a different database then triggers can't write to the other table so the queue would be needed later.
If we need robust syncing of
visibility_level
I think we need it to be queued from DB triggers because we could miss updates. And like I suggested I'd recommend using a DB trigger which then writes to a queue (like loose foreign keys) instead of a DB trigger that just syncs the value if you plan for this catalog to be "intsance wide" once we have cells architecture. If you write to both tables in the same transaction then you'll need to rewrite that soon as we start enforcing that you cannot write to both tables for cells soon.
Proposal
Implement a syncing process that involves a queue table of sync events that are processed by workers. Follow a similar approach to the one documented in Ci mirrored tables.
On project deletion:
When a record on projects
is deleted, we have cascading deletion on the database that removes the corresponding record on catalog_resources
. For this reason, we don't need to add a sync event on project deletion.
Steps
1. Add migration to create a partitioned queue table.
CREATE TABLE p_catalog_resource_sync_events (
id bigint NOT NULL,
catalog_resource_id bigint NOT NULL,
project_id bigint NOT NULL,
partition_id bigint DEFAULT 1 NOT NULL,
status smallint DEFAULT 1 NOT NULL,
created_at timestamp with time zone DEFAULT now() NOT NULL,
updated_at timestamp with time zone DEFAULT now() NOT NULL
)
PARTITION BY LIST (partition_id);
- This table should be List partitioned as suggested in !137238 (comment 1665396792). A similar approach for reference: !125333 (diffs).
- The corresponding model should have a similar structure as Namespaces::SyncEvent (this is so we can leverage the existing Ci::ProcessSyncEventsService class).
2. Create model callback and PG trigger.
- Add a callback to
Ci::Catalog::Resource
to sync the columns on record creation. - Add a PG trigger to insert a SyncEvent when the associated
projects
record of a catalog resource updates its name, description, or visibility_level column.
3. Create worker to process sync events.
This worker should follow a similar set up as Namespaces::ProcessSyncEventsWorker.
-
Ci::Catalog::Resources::ProcessSyncEventsWorker
:- Processes the records in catalog_resource_sync_events.
- Enqueued on Project after_update if it has a catalog_resource association and there was a change saved to its name, description, or visibility_level attribute.
-
The existing Ci::ProcessSyncEventsService will need to be updated to support a partitioned sync_event table.
4. Set the worker to be enqueued every N minutes.
Set up Ci::Catalog::Resources::ProcessSyncEventsWorker
to be enqueued every 1 minute. This allows us to regularly process any direct/bulk database updates that weren't made using the Project model.
MR Implementation
Description | MR |
---|---|
Steps 1-3: Background syncing of catalog_resources and pro... (!137238 - merged) -- Feature flag: ci_process_catalog_resource_sync_events , Roll out issue: #432963 (closed)
|
!137238 (merged) |
Step 4: Add cron to run catalog resources ProcessSyncEv... (!138865 - merged) | !138865 (merged) |