Add audit_events partitioning migration
What does this MR do?
This builds on top of !35981 (merged) and adds the migration for audit_events
partitioning.
In detail, this performs the following steps:
- Create a new table
audit_events_part_5fc467ac26
with a nearly identical schema asaudit_events
. This is the partitioned table. - Create a couple of partitions for it
- Install a trigger that keeps tables in sync - in particular it mirrors inserts into
audit_events
to the newly created table - Schedule background migration jobs to copy data from
audit_events
to the partitioned table
migration log
== 20200629125335 PartitionAuditEvents: migrating =============================
-- transaction_open?()
-> 0.0000s
-- table_exists?("audit_events_part_5fc467ac26")
-> 0.0004s
-- transaction()
-- execute("CREATE TABLE audit_events_part_5fc467ac26 (\n LIKE audit_events INCLUDING ALL EXCLUDING INDEXES,\n partition_key_bfab5f7bb7 timestamp without time zone NOT NULL,\n PRIMARY KEY (id, partition_key_bfab5f7bb7)\n) PARTITION BY RANGE (partition_key_bfab5f7bb7)\n")
-> 0.0020s
-- remove_column("audit_events_part_5fc467ac26", "created_at")
-> 0.0011s
-- rename_column("audit_events_part_5fc467ac26", "partition_key_bfab5f7bb7", "created_at")
-> 0.0015s
-- change_column_default("audit_events_part_5fc467ac26", "id", nil)
-> 0.0030s
-- change_column("audit_events_part_5fc467ac26", "id", :bigint)
-> 0.0018s
-> 0.0249s
-- table_exists?("gitlab_partitions_dynamic.audit_events_part_5fc467ac26_000000")
-> 0.0008s
-- execute("CREATE TABLE gitlab_partitions_dynamic.audit_events_part_5fc467ac26_000000 PARTITION OF audit_events_part_5fc467ac26\nFOR VALUES FROM (MINVALUE) TO ('2020-07-01')\n")
-> 0.0143s
-- table_exists?("gitlab_partitions_dynamic.audit_events_part_5fc467ac26_202007")
-> 0.0004s
-- execute("CREATE TABLE gitlab_partitions_dynamic.audit_events_part_5fc467ac26_202007 PARTITION OF audit_events_part_5fc467ac26\nFOR VALUES FROM ('2020-07-01') TO ('2020-08-01')\n")
-> 0.0216s
-- table_exists?("gitlab_partitions_dynamic.audit_events_part_5fc467ac26_202008")
-> 0.0005s
-- execute("CREATE TABLE gitlab_partitions_dynamic.audit_events_part_5fc467ac26_202008 PARTITION OF audit_events_part_5fc467ac26\nFOR VALUES FROM ('2020-08-01') TO ('2020-09-01')\n")
-> 0.0130s
-- execute("CREATE FUNCTION table_sync_function_2be879775d()\nRETURNS TRIGGER AS\n$$\nBEGIN\nIF (TG_OP = 'DELETE') THEN\n DELETE FROM audit_events_part_5fc467ac26 where id = OLD.id;\nELSIF (TG_OP = 'UPDATE') THEN\n UPDATE audit_events_part_5fc467ac26\n SET author_id = NEW.author_id,\n type = NEW.type,\n entity_id = NEW.entity_id,\n entity_type = NEW.entity_type,\n details = NEW.details,\n updated_at = NEW.updated_at,\n ip_address = NEW.ip_address,\n author_name = NEW.author_name,\n created_at = NEW.created_at\n WHERE audit_events_part_5fc467ac26.id = NEW.id;\nELSIF (TG_OP = 'INSERT') THEN\n INSERT INTO audit_events_part_5fc467ac26 (id,\n author_id,\n type,\n entity_id,\n entity_type,\n details,\n updated_at,\n ip_address,\n author_name,\n created_at)\n VALUES (NEW.id,\n NEW.author_id,\n NEW.type,\n NEW.entity_id,\n NEW.entity_type,\n NEW.details,\n NEW.updated_at,\n NEW.ip_address,\n NEW.author_name,\n NEW.created_at);\nEND IF;\nRETURN NULL;\n\nEND\n$$ LANGUAGE PLPGSQL\n")
-> 0.0053s
-- execute("COMMENT ON FUNCTION table_sync_function_2be879775d IS 'Partitioning migration: table sync for audit_events table'")
-> 0.0004s
-- execute("CREATE TRIGGER table_sync_trigger_ee39a25f9d\nAFTER INSERT OR UPDATE OR DELETE ON audit_events\nFOR EACH ROW\nEXECUTE PROCEDURE table_sync_function_2be879775d()\n")
-> 0.0003s
== 20200629125335 PartitionAuditEvents: migrated (0.1913s) ====================
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides - [-] Separation of EE specific content
Availability and Testing
There is going to be a bit of an overhead because of the double writes to both tables. We've measured this in a benchmark to land at around 6% overhead in query timings. We don't expect a problem due to the relatively low insert rate of about 10-20 inserts per second at peak times.
Reference: #223908 (comment 379438751)
Edited by Mayra Cabrera