Optimize mirrored project counters with pipelines enabled
What does this MR do?
Optimize query produced by https://gitlab.com/gitlab-org/gitlab/-/blob/908902d1d4f6fbcd7780150c780837fd4fa8b301/ee/lib/ee/gitlab/usage_data.rb#L303
projects_mirrored_with_pipelines_enabled: distinct_count(::Project.mirrored_with_enabled_pipelines.where(time_period), :creator_id),
query with time constraint and batching
SELECT
COUNT(DISTINCT "projects"."creator_id")
FROM
"projects"
INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
WHERE
"projects"."mirror" = TRUE
AND "projects"."mirror_trigger_builds" = TRUE
AND "project_features"."builds_access_level" = 20
AND "projects"."created_at" BETWEEN '2020-02-10 16:04:08.028772'
AND '2020-03-09 16:04:08.028850'
AND "projects"."creator_id" BETWEEN 810000 AND 811250;
query with no time constraint and batching
SELECT
COUNT(DISTINCT "projects"."creator_id")
FROM
"projects"
INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
WHERE
"projects"."mirror" = TRUE
AND "projects"."mirror_trigger_builds" = TRUE
AND "project_features"."builds_access_level" = 20
AND "projects"."creator_id" BETWEEN 810000 AND 811250;
explain data from database-lab
-
id
between1 AND 10_000
andcreator_id
between810_000 AND 811_250
Before
- with time constraint - https://explain.depesz.com/s/dSfl
- with no time constraint - https://explain.depesz.com/s/O1XM
Observations
- filter applied when
Index Cond: (project_features.project_id = projects.id)
Filter: (project_features.builds_access_level = 20)
- combat this with adding index
CREATE INDEX index_project_features_on_project_id_and_repository_access_level_20
ON project_features(project_id)
where project_features.builds_access_level = 20;
- explain now shows index only condition hit for that part, but still filters are applied in other conditions - https://explain.depesz.com/s/Omtr
Index Cond: (project_features.project_id = projects.id)
- Filter being applied on
creator_id
,created_at
,mirror
andmirror_trigger_builds
- Without time constraint
Index Cond: ((projects.creator_id >= 810000) AND (projects.creator_id <= 811250))
Filter: (projects.mirror AND projects.mirror_trigger_builds AND (projects.created_at >= '2020-02-10 16:04:08.028772+00'::timestamp with time zone) AND (projects.created_at <= '2020-03-09 16:04:08.02885+00'::timestamp with time zone))
- With time constraint
Index Cond: ((projects.created_at >= '2020-02-10 16:04:08.028772+00'::timestamp with time zone) AND (projects.created_at <= '2020-03-09 16:04:08.02885+00'::timestamp with time zone))
Filter: (projects.mirror AND projects.mirror_trigger_builds)
- combat this with adding index
CREATE INDEX index_projects_on_creator_id_and_mirror
ON projects(creator_id, created_at)
where projects.mirror = true and projects.mirror_trigger_builds = true;
- explain now shows index only condition hit for that part
- no time constraint - https://explain.depesz.com/s/GjRi
- with time constraint - https://explain.depesz.com/s/BxOD
- MAX/MIN calculations
- Query
SELECT
MAX("projects"."creator_id")
FROM
"projects"
INNER JOIN "project_features" ON "project_features"."project_id" = "projects"."id"
WHERE
"projects"."mirror" = TRUE
AND "projects"."mirror_trigger_builds" = TRUE
AND "project_features"."builds_access_level" = 20
AND "projects"."created_at" BETWEEN '2020-02-10 16:04:08.028772'
AND '2020-03-09 16:04:08.028850';
- Explain results(no index, far over acceptable thresholds) - https://explain.depesz.com/s/LgKZ
- After adding indexes
- no time constraint - https://explain.depesz.com/s/OGc4
- with time constraint - https://explain.depesz.com/s/T5Hm
- After adding indexes
Plan
- Add these indexes
CREATE INDEX index_project_features_on_project_id_and_bal_20
ON project_features(project_id, builds_access_level)
where project_features.builds_access_level = 20;
CREATE INDEX index_projects_on_creator_id_and_mirror
ON projects(mirror, mirror_trigger_builds, creator_id, created_at)
where projects.mirror = true and projects.mirror_trigger_builds = true;
After only index conditions are hit
- with time constraint - https://explain.depesz.com/s/BxOD
- no time constraint - https://explain.depesz.com/s/GjRi
Timing
After the index for batch counting takes 11_880 seconds super pessimistic using database-lab
- 5.5 million users,
- with 1_250 batch sizes
- 5.5M/1_250 = 4_400 loops
- Time: < 2.7s ( cold cache with no time constraint )
Migration output
12:29 $ rails db:migrate:up VERSION=20200309195209
== 20200309195209 AddIndexOnProjectIdAndBuildsAccessLevelToProjectFeatures: migrating
-- transaction_open?()
-> 0.0000s
-- index_exists?(:project_features, :project_id, {:where=>"builds_access_level = 20", :name=>"index_project_features_on_project_id_bal_20", :algorithm=>:concurrently})
-> 0.0025s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- add_index(:project_features, :project_id, {:where=>"builds_access_level = 20", :name=>"index_project_features_on_project_id_bal_20", :algorithm=>:concurrently})
-> 0.0034s
-- execute("RESET ALL")
-> 0.0004s
== 20200309195209 AddIndexOnProjectIdAndBuildsAccessLevelToProjectFeatures: migrated (0.0071s)
✔ ~/projects/gdk/gitlab [208887-optimize-project-counters-mirrored-pipelines|✚ 1…1⚑ 1]
15:54 $ VERBOSE=true be rake db:migrate:down VERSION=20200309195209
== 20200309195209 AddIndexOnProjectIdAndBuildsAccessLevelToProjectFeatures: reverting
-- transaction_open?()
-> 0.0000s
-- indexes(:project_features)
-> 0.0028s
-- execute("SET statement_timeout TO 0")
-> 0.0004s
-- remove_index(:project_features, {:algorithm=>:concurrently, :name=>"index_project_features_on_project_id_bal_20"})
-> 0.0018s
-- execute("RESET ALL")
-> 0.0004s
== 20200309195209 AddIndexOnProjectIdAndBuildsAccessLevelToProjectFeatures: reverted (0.0055s)
12:27 $ rails db:migrate:up VERSION=20200309195710
== 20200309195710 AddIndexOnMirrorAndCreatorIdAndCreatedAtToProjects: migrating
-- transaction_open?()
-> 0.0000s
-- index_exists?(:projects, [:creator_id, :created_at], {:where=>"mirror = true and mirror_trigger_builds = true", :name=>"index_projects_on_mirror_creator_id_created_at", :algorithm=>:concurrently})
-> 0.0164s
-- execute("SET statement_timeout TO 0")
-> 0.0013s
-- add_index(:projects, [:creator_id, :created_at], {:where=>"mirror = true and mirror_trigger_builds = true", :name=>"index_projects_on_mirror_creator_id_created_at", :algorithm=>:concurrently})
-> 0.0051s
-- execute("RESET ALL")
-> 0.0008s
== 20200309195710 AddIndexOnMirrorAndCreatorIdAndCreatedAtToProjects: migrated (0.0238s)
✔ ~/projects/gdk/gitlab [208887-optimize-project-counters-mirrored-pipelines|✚ 1…2⚑ 1]
15:59 $ VERBOSE=true be rake db:migrate:down VERSION=20200309195710
== 20200309195710 AddIndexOnMirrorAndCreatorIdAndCreatedAtToProjects: reverting
-- transaction_open?()
-> 0.0000s
-- indexes(:projects)
-> 0.0116s
-- execute("SET statement_timeout TO 0")
-> 0.0004s
-- remove_index(:projects, {:algorithm=>:concurrently, :name=>"index_projects_on_mirror_creator_id_created_at"})
-> 0.0023s
-- execute("RESET ALL")
-> 0.0003s
== 20200309195710 AddIndexOnMirrorAndCreatorIdAndCreatedAtToProjects: reverted (0.0148s)
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides - [-] Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team
Related to #208887 (closed)
Edited by Mayra Cabrera