Harden CI pipelines usage data queries
What does this MR do?
Adds a better index index_ci_pipelines_on_user_id_and_created_at_and_source
to cover ci_builds.source
conditions which helps the below counters
- We precompute the MIN/MAX values for the user_id as start/finish parameters
- For Query 1 Count query
- Before https://explain.depesz.com/s/tsde ~203 seconds
- After https://explain.depesz.com/s/b8Qb ~120 milliseconds
- We see the same improvement in all 4 counters where optimizer chooses source filter or not depending on the data
- Drops previous index
index_ci_pipelines_on_user_id_and_created_at
!26774 (merged) - The new index is backwards compatible in order of columns
Read main issue #220477 (closed) for investigation
Optimization
DROP INDEX index_ci_pipelines_on_user_id_and_created_at
CREATE INDEX index_ci_pipelines_on_user_id_and_created_at_and_source ON public.ci_pipelines USING btree (user_id, created_at, source)
Ruby code and SQLs
Query 1
time_period = {}
ci_external_pipelines: Gitlab::UsageData.distinct_count(::Ci::Pipeline.external.where(time_period), :user_id),
SELECT COUNT(DISTINCT "ci_pipelines"."user_id") FROM "ci_pipelines" WHERE "ci_pipelines"."source" = 6 AND "ci_pipelines"."user_id" BETWEEN 0 AND 1250
Query 2
ci_internal_pipelines: Gitlab::UsageData.distinct_count(::Ci::Pipeline.internal.where(time_period), :user_id)
SELECT COUNT(DISTINCT "ci_pipelines"."user_id") FROM "ci_pipelines" WHERE ("ci_pipelines"."source" IN (1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12) OR "ci_pipelines"."source" IS NULL) AND "ci_pipelines"."user_id" BETWEEN 1 AND 1250
Index size & timing
- Drop index takes 23 seconds
- Create index takes 20 minutes
gitlabhq_production=> \di+ index_ci_pipelines_on_user_id_and_created_at
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+----------------------------------------------+-------+--------+--------------+---------+-------------
public | index_ci_pipelines_on_user_id_and_created_at | index | gitlab | ci_pipelines | 6285 MB |
(1 row)
\di+ index_ci_pipelines_on_user_id_and_created_at_and_source
Schema | Name | Type | Owner | Table | Size | Description
--------+---------------------------------------------------------+-------+--------+--------------+---------+-------------
public | index_ci_pipelines_on_user_id_and_created_at_and_source | index | gitlab | ci_pipelines | 5534 MB |
(1 row)
Migration output
VERBOSE=true bundle exec rake db:migrate:up VERSION=20200608075553
== 20200608075553 AddIndexOnUserIdAndCreatedAtAndSourceToCiPipelines: migrating
-- transaction_open?()
-> 0.0000s
-- index_exists?(:ci_pipelines, [:user_id, :created_at, :source], {:algorithm=>:concurrently})
-> 0.0059s
-- add_index(:ci_pipelines, [:user_id, :created_at, :source], {:algorithm=>:concurrently})
-> 0.0242s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:ci_pipelines, [:user_id, :created_at], {:algorithm=>:concurrently})
-> 0.0197s
-- remove_index(:ci_pipelines, {:algorithm=>:concurrently, :column=>[:user_id, :created_at]})
-> 0.0244s
== 20200608075553 AddIndexOnUserIdAndCreatedAtAndSourceToCiPipelines: migrated (0.0756s)
aakgun@saygitu:~/aakgun/1/gdk/gitlab$ VERBOSE=true bundle exec rake db:migrate:down VERSION=20200608075553
== 20200608075553 AddIndexOnUserIdAndCreatedAtAndSourceToCiPipelines: reverting
-- transaction_open?()
-> 0.0000s
-- index_exists?(:ci_pipelines, [:user_id, :created_at], {:algorithm=>:concurrently})
-> 0.0056s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:ci_pipelines, [:user_id, :created_at, :source], {:algorithm=>:concurrently})
-> 0.0036s
-- remove_index(:ci_pipelines, {:algorithm=>:concurrently, :column=>[:user_id, :created_at, :source]})
-> 0.0105s
== 20200608075553 AddIndexOnUserIdAndCreatedAtAndSourceToCiPipelines: reverted (0.0202s)
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required)
- [-] Code review guidelines
- [-] Merge request performance guidelines
-
Style guides - [-] Database guides
- [-] Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Alper Akgun