Add soft delete option to ClickHouse events table
What does this MR do and why?
This MR adds soft delete capability to the ClickHouse events
table by leveraging the is_deleted
option for the ReplacingMergeTree engine: https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree#is_deleted
Reasoning: we don't have strong consistency between CH and PG. To ensure eventual consistency we might periodically scan the CH and PG events table and delete the missing (deleted in PG) rows from CH.
Deployment
The feature is not available on production, at the moment we're doing experiments on STG. Since we don't have DB migration framework for CH yet the schema changes will happen by hand.
How to set up and validate locally
See the extended test case in the MR. How it works:
- You have a row in the
events
table with id=3 - Insert a new "version" of the row with higher
updated_at
timestamp and set thedeleted
column to 1 - Running
SELECT * FROM events FINAL
ensures that the "deleted" rows are filtered out. (this steps normally happens async)
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.