WIP: Resolve "Optimize project counters related to the ::Project"
What does this MR do?
Optimize queries produced by:
protected_branches: distinct_count(::Project.with_protected_branches.where(time_period), :creator_id)
query with time constraint and batching
SELECT
COUNT(DISTINCT "projects"."creator_id")
FROM
"projects"
INNER JOIN "protected_branches" ON "protected_branches"."project_id" = "projects"."id"
WHERE
"projects"."created_at" BETWEEN '2020-02-10 16:04:08.028772'
AND '2020-03-09 16:04:08.028850'
AND "projects"."creator_id" BETWEEN 810000 AND 811250;
query with no time constraint and batching
SELECT
COUNT(DISTINCT "projects"."creator_id")
FROM
"projects"
INNER JOIN "protected_branches" ON "protected_branches"."project_id" = "projects"."id"
WHERE
"projects"."creator_id" BETWEEN 810000 AND 811250;
explain data from database-lab
-
creator_id
between810_000 AND 811_250
Before
- with time constraint - https://explain.depesz.com/s/KFnK
- no time constraint - https://explain.depesz.com/s/Ug7o
Observations
- filter applied when time condition with creator_id was performed (no index)
Index Cond: ((projects.creator_id >= 810000) AND (projects.creator_id <= 811250))
Filter: ((projects.created_at >= '2020-02-10 16:04:08.028772+00'::timestamp with time zone) AND (projects.created_at <= '2020-03-09 16:04:08.02885+00'::timestamp with time zone))
- combat this with adding index
- explain now shows index only condition hit(improvement)
- time constraint -
- no time constraint -
- MAX/MIN calculations
query
SELECT
MAX("projects"."creator_id")
FROM
"projects"
INNER JOIN "protected_branches" ON "protected_branches"."project_id" = "projects"."id"
WHERE
"projects"."created_at" BETWEEN '2020-02-13 14:57:26.678272'
AND '2020-03-12 14:57:26.678362'
- Explain results(within thresholds:
xxx
) -
Plan
- Add these indexes
After only index conditions are hit
- with time constraint -
- no time constraint -
Timing
After the index for batch counting takes xxx seconds pessimistic
- 5.5 million users,
- with 1_250 batch sizes
- 5.5M/1_250 = 4_400 loops
- Time: < xxx ( cold cache with no time constraint )
Migration output
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Related to #208887 (closed)
Edited by Doug Stull