Dedup issue_metrics table
What does this MR do?
This MR deduplicates the issue_metrics
table in a similar fashion like !29566 (merged)
The MR does the following:
- The application code ensures that
issue_metrics
records are created with the safefind_or_create
method to avoidRecordNotUnique
errors (if metrics records are created concurrently). - Add a new unique index with the highest
issue_id
which prevents further duplicates (very unlikely that we'll see duplicates). - Iterate over
issue_metrics
withEachBatch
and find duplicated entries. - Eliminate the duplicated rows by "merging" them.
- Add global unique index on
issue_id
and remove the tmp index created by step 2.
Migration
Up:
== 20210226141517 DedupIssueMetrics: migrating ================================
-- transaction_open?()
-> 0.0000s
-- index_exists?(:issue_metrics, :issue_id, {:where=>"id > 504", :unique=>true, :name=>"tmp_unique_issue_metrics_by_issue_id", :algorithm=>:concurrently})
-> 0.0020s
-- execute("SET statement_timeout TO 0")
-> 0.0004s
-- add_index(:issue_metrics, :issue_id, {:where=>"id > 504", :unique=>true, :name=>"tmp_unique_issue_metrics_by_issue_id", :algorithm=>:concurrently})
-> 0.0182s
-- execute("RESET ALL")
-> 0.0005s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:issue_metrics, :issue_id, {:unique=>true, :name=>"index_unique_issue_metrics_issue_id", :algorithm=>:concurrently})
-> 0.0018s
-- add_index(:issue_metrics, :issue_id, {:unique=>true, :name=>"index_unique_issue_metrics_issue_id", :algorithm=>:concurrently})
-> 0.0035s
-- transaction_open?()
-> 0.0000s
-- indexes(:issue_metrics)
-> 0.0018s
-- remove_index(:issue_metrics, {:algorithm=>:concurrently, :name=>"tmp_unique_issue_metrics_by_issue_id"})
-> 0.0015s
-- transaction_open?()
-> 0.0000s
-- indexes(:issue_metrics)
-> 0.0014s
-- remove_index(:issue_metrics, {:algorithm=>:concurrently, :name=>"index_issue_metrics"})
-> 0.0018s
== 20210226141517 DedupIssueMetrics: migrated (0.0476s) =======================
Down:
== 20210226141517 DedupIssueMetrics: reverting ================================
-- transaction_open?()
-> 0.0000s
-- index_exists?(:issue_metrics, :issue_id, {:name=>"index_issue_metrics", :algorithm=>:concurrently})
-> 0.0028s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- add_index(:issue_metrics, :issue_id, {:name=>"index_issue_metrics", :algorithm=>:concurrently})
-> 0.0184s
-- execute("RESET ALL")
-> 0.0016s
-- transaction_open?()
-> 0.0000s
-- indexes(:issue_metrics)
-> 0.0060s
-- current_schema()
-> 0.0005s
-- transaction_open?()
-> 0.0000s
-- indexes(:issue_metrics)
-> 0.0062s
-- remove_index(:issue_metrics, {:algorithm=>:concurrently, :name=>"index_unique_issue_metrics_issue_id"})
-> 0.0039s
== 20210226141517 DedupIssueMetrics: reverted (0.0498s) =======================
Database
Queries:
- Iteration query (using pkey and
EachBatch
): https://explain.depesz.com/s/waD8d - Duplication finder query: https://explain.depesz.com/s/eJKL
Timing calculation:
(32_000_000 rows / BATCH_SIZE_1000) * (2 queries, ~10ms) = < 6 minutes
The calculation does not contain the deduplication time. IMHO it's negligible since we don't have too many duplicated rows (6 rows).
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Related to #214456 (closed)
Edited by Adam Hegyi