BG migration for MR metrics target_project_id
What does this MR do?
This MR adds target_project_id
column to merge_request_metrics
and keeps it in sync with merge_requests.target_project_id
.
The goal of this denormalization is to get better query timings when querying the merged_at
column in merge_requests_metrics
table. Currently when querying merged_at
, the DB needs to process all MR records and join the metrics table for the given project/projects.
Efficient queries on MRs and the merged_at
column will be required for MR analytics (throughput) feature: #229045 (closed)
I tested the import / export feature. Importing an older GL export is working, the target_project_id
will be set automatically by the before_save
callback on MergeRequest::Metrics
. I didn't notice N+1 query on merge_requests (finding the target_project_id
), probably because we have the inverse_of
set up.
Inefficient query:
SELECT count("merge_requests".id)
FROM "merge_requests"
INNER JOIN merge_request_metrics ON merge_request_metrics.merge_request_id=merge_requests.id AND (merged_at > '2020-05-01 00:00') AND (merged_at < '2020-05-31 23:59')
WHERE merge_requests.target_project_id IN (278964, 7764)
ERROR: canceling statement due to statement timeout
New query:
SELECT count("merge_request_metrics".merge_request_id)
FROM "merge_requests"
INNER JOIN merge_request_metrics ON merge_request_metrics.merge_request_id=merge_requests.id AND (merged_at > '2020-05-01 00:00') AND (merged_at < '2020-05-31 23:59')
WHERE merge_request_metrics.target_project_id IN (278964, 7764)
Note: There is an index on merged_at
and target_project_id
, I'll add it in a follow up issue where the query is going to be implemented.
- Plan (uncached): https://explain.depesz.com/s/LFMo
- Plan (cached): https://explain.depesz.com/s/98vE
Joining the merge_requests
table will be required if we need extra filtering on the data (assignee, author, labels, etc.).
Migrations
Up
== 20200723125205 AddTargetProjectIdToMrMetrics: migrating ====================
-- add_column(:merge_request_metrics, :target_project_id, :integer)
-> 0.0008s
== 20200723125205 AddTargetProjectIdToMrMetrics: migrated (0.0045s) ===========
== 20200723128332 AddFkToMetricsTargetProjectId: migrating ====================
-- transaction_open?()
-> 0.0000s
-- index_exists?(:merge_request_metrics, :target_project_id, {:algorithm=>:concurrently})
-> 0.0031s
-- add_index(:merge_request_metrics, :target_project_id, {:algorithm=>:concurrently})
-> 0.0051s
-- transaction_open?()
-> 0.0000s
-- foreign_keys(:merge_request_metrics)
-> 0.0019s
-- execute("ALTER TABLE merge_request_metrics\nADD CONSTRAINT fk_56067dcb44\nFOREIGN KEY (target_project_id)\nREFERENCES projects (id)\nON DELETE CASCADE\nNOT VALID;\n")
-> 0.0023s
-- execute("ALTER TABLE merge_request_metrics VALIDATE CONSTRAINT fk_56067dcb44;")
-> 0.0026s
== 20200723128332 AddFkToMetricsTargetProjectId: migrated (0.0177s) ===========
== 20200723132258 ScheduleCopyOfMrTargetProjectIdToMrMetrics: migrating =======
== 20200723132258 ScheduleCopyOfMrTargetProjectIdToMrMetrics: migrated (0.0450s)
Down
== 20200723132258 ScheduleCopyOfMrTargetProjectIdToMrMetrics: reverting =======
== 20200723132258 ScheduleCopyOfMrTargetProjectIdToMrMetrics: reverted (0.0000s)
== 20200723128332 AddFkToMetricsTargetProjectId: reverting ====================
-- remove_foreign_key(:merge_request_metrics, {:column=>:target_project_id})
-> 0.0059s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:merge_request_metrics, :target_project_id, {:algorithm=>:concurrently})
-> 0.0038s
-- remove_index(:merge_request_metrics, {:algorithm=>:concurrently, :column=>:target_project_id})
-> 0.0048s
== 20200723128332 AddFkToMetricsTargetProjectId: reverted (0.0148s) ===========
== 20200723125205 AddTargetProjectIdToMrMetrics: reverting ====================
-- remove_column(:merge_request_metrics, :target_project_id, :integer)
-> 0.0013s
== 20200723125205 AddTargetProjectIdToMrMetrics: reverted (0.0039s) ===========
BG migration
48M records, 5K batches. Estimated runtime: about 13 days
WITH merge_requests_batch AS
(SELECT id,
target_project_id
FROM merge_requests
WHERE id BETWEEN 27079 AND 33101)
UPDATE merge_request_metrics
SET target_project_id = merge_requests_batch.target_project_id
FROM merge_requests_batch
WHERE merge_request_metrics.merge_request_id=merge_requests_batch.id
The plan is from DB lab, most of the time is spent in IO.
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team