Eventual author consistency worker for CA
What does this MR do and why?
This MR adds a worker for ensuring eventual consistency in the ClickHouse database for the Contribution Analytics feature. When the user is deleted from the database, this worker ensures that the event records are cleaned up in ClickHouse.
How does it work:
In ClickHouse the event_authors
table tracks the unique author ids (user ids) for the events
table. The worker iterates over the table and checks if the user exists or not. If the user cannot be found in the PostgreSQL database, delete all events related to the user in ClickHouse.
How to set up and validate locally
- Ensure that you're on premium plan
- Ensure CH is configured: https://docs.gitlab.com/ee/development/database/clickhouse/clickhouse_within_gitlab.html#gdk-setup
- Enable the sync feature flag:
Feature.enabled(:event_sync_worker_for_click_house)
- If your GDK is seeded, you can sync initial data to ClickHouse from rails console:
ClickHouse::EventsSyncWorker.new.perform
- Find a user that has events and delete it:
author_id = Event.pluck(:author_id).uniq.sort.last
User.where(id: author_id).delete_all
# Verify if we have some data in ClickHouse for this author
ClickHouse::Client.select("select * from events where author_id = #{author_id}", :main)
# Invoke the worker
ClickHouse::EventAuthorsConsistencyCronWorker.new.perform
Verify that data is gone from ClickHouse:
ClickHouse::Client.select("select * from events where author_id = #{author_id}", :main)
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #428260 (closed)
Edited by Adam Hegyi