Fix count_distinct_merge_request_id_from_scan_finding_approval_merge_request_rules metric
What does this MR do and why?
In GitLab::Database::BatchCounter
we have unwanted_configuration?
method that is checking if parameters configured for given Instrumentation class are valid, in other case value for given instrumentation class will be invalid: -1
:
def unwanted_configuration?(finish, batch_size, start)
(@operation == :count && batch_size <= MIN_REQUIRED_BATCH_SIZE) ||
(@operation == :sum && batch_size < DEFAULT_SUM_BATCH_SIZE) ||
(finish - start) / batch_size >= MAX_ALLOWED_LOOPS ||
start >= finish
end
Let's see if current parameters are violating this check:
MIN_REQUIRED_BATCH_SIZE = 1_250
DEFAULT_SUM_BATCH_SIZE = 1_000
MAX_ALLOWED_LOOPS = 10_000
@operation = :count
batch_size = 10_000
start = ::ApprovalMergeRequestRule.minimum(:merge_request_id) # => 1
finish = ::ApprovalMergeRequestRule.maximum(:merge_request_id) # => ~190000000
(@operation == :count && batch_size <= MIN_REQUIRED_BATCH_SIZE) # => false
(@operation == :sum && batch_size < DEFAULT_SUM_BATCH_SIZE) # => false
start >= finish # => false
(finish - start) / batch_size >= MAX_ALLOWED_LOOPS # => (190_000_000 - 0) / 10_000 >= 10_000 => true
To fix this we need to change batch_size
parameter to make sure the last condition is not met, in this MR we are changing this to 50_000
.
explain SELECT COUNT(DISTINCT "approval_merge_request_rules"."merge_request_id") FROM "approval_merge_request_rules" WHERE "approval_merge_request_rules"."report_type" = 4 AND "approval_merge_request_rules"."merge_request_id" BETWEEN 188900000 AND 188950000;
With https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/13333/commands/46799 we have checked that it performs without any issues.
We need to check if this will perform well for monthly
metric as well:
explain SELECT COUNT(DISTINCT "approval_merge_request_rules"."merge_request_id") FROM "approval_merge_request_rules" WHERE "approval_merge_request_rules"."report_type" = 4 AND "approval_merge_request_rules"."merge_request_id" BETWEEN 188100000 AND 188150000 AND "approval_merge_request_rules"."created_at" BETWEEN '2022-10-16 01:10:49.374014' AND '2022-11-13 01:10:49.374149';
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/13333/commands/46800
Although it performs well we need to add index there to improve the query.
exec CREATE INDEX scan_finding_approval_mr_rule_index_merge_request_id_and_created_at ON approval_merge_request_rules USING btree (merge_request_id, created_at) WHERE (report_type = 4);
The query has been executed. Duration: 3.922 min (edited)
After this same query performs flawlessly: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/13333/commands/46802
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #377717 (closed)