Improve performance of CountUserAuth query
What does this MR do and why?
- The service ping metric has been failing and "most of the failures of metrics are caused by performance issues of underlying database queries"
- Reference: #374735 (comment 1134246520)
- A database lab query shows that the current metric is using a suboptimal index
- We swapping in a new index for an old index which was added here: !39747 (merged)
- The existing index only shows usage if we expand the usage rate to
[30d]
. But it's close to 0 usage because it only being used to generate metrics. - A postgres.ai query shows that the existing metric can also use this new index so it OK to remove the old indx: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/13508/commands/47456
Raw SQL
Gitlab::Usage::Metrics::Instrumentations::CountUserAuthMetric.new(time_frame: '28d', data_source: 'database').value
runs the following queries:
SELECT MIN("authentication_events"."user_id") FROM "authentication_events" WHERE "authentication_events"."result" = 1 AND "authentication_events"."created_at" BETWEEN '2022-10-24 18:09:02.893805' AND '2022-11-21 18:09:02.893880'
SELECT MAX("authentication_events"."user_id") FROM "authentication_events" WHERE "authentication_events"."result" = 1 AND "authentication_events"."created_at" BETWEEN '2022-10-24 18:09:02.893805' AND '2022-11-21 18:09:02.893880'
SELECT COUNT(DISTINCT "authentication_events"."user_id") FROM "authentication_events" WHERE "authentication_events"."result" = 1 AND "authentication_events"."created_at" BETWEEN '2022-10-24 18:09:02.893805' AND '2022-11-21 18:09:02.893880' AND "authentication_events"."user_id" >= 1 AND "authentication_events"."user_id" < 3
Query plans
Queries before this index added
MIN
- https://explain.depesz.com/s/VRJj
- https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/13468/commands/47168
MAX
- https://explain.depesz.com/s/PVIG
- https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/13468/commands/47169
COUNT
- https://explain.depesz.com/s/zylR
- https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/13468/commands/47171
Queries after this index added
MIN
- https://explain.depesz.com/s/WZjq
- https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/13508/commands/47455
MAX
- https://explain.depesz.com/s/nFPO
- https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/13508/commands/47454
COUNT
- https://explain.depesz.com/s/UiW8
- https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/13508/commands/47453
Output from migrations
Migrating
main: == 20221125222221 AddMetricsIndexToAuthenticationEvents: migrating ============
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.0003s
main: -- index_exists?(:authentication_events, [:user_id, :provider, :created_at], {:where=>"result = 1", :name=>"index_successful_authentication_events_for_metrics", :algorithm=>:concurrently})
main: -> 0.0012s
main: -- add_index(:authentication_events, [:user_id, :provider, :created_at], {:where=>"result = 1", :name=>"index_successful_authentication_events_for_metrics", :algorithm=>:concurrently})
main: -> 0.0011s
main: == 20221125222221 AddMetricsIndexToAuthenticationEvents: migrated (0.0056s) ===
main: == 20221125222341 RemoveResultIndexFromAuthenticationEvents: migrating ========
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.0023s
main: -- indexes(:authentication_events)
main: -> 0.0026s
main: -- current_schema()
main: -> 0.0001s
main: == 20221125222341 RemoveResultIndexFromAuthenticationEvents: migrated (0.0094s)
Rolling back
main: == 20221125222341 RemoveResultIndexFromAuthenticationEvents: reverting ========
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.0024s
main: -- index_exists?(:authentication_events, [:provider, :user_id, :created_at], {:where=>"result = 1", :name=>"index_authentication_events_on_provider_user_id_created_at", :algorithm=>:concurrently})
main: -> 0.0023s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0001s
main: -- add_index(:authentication_events, [:provider, :user_id, :created_at], {:where=>"result = 1", :name=>"index_authentication_events_on_provider_user_id_created_at", :algorithm=>:concurrently})
main: -> 0.0015s
main: -- execute("RESET statement_timeout")
main: -> 0.0001s
main: == 20221125222341 RemoveResultIndexFromAuthenticationEvents: reverted (0.0131s)
main: == 20221125222221 AddMetricsIndexToAuthenticationEvents: reverting ============
main: -- transaction_open?()
main: -> 0.0000s
main: -- view_exists?(:postgres_partitions)
main: -> 0.0003s
main: -- indexes(:authentication_events)
main: -> 0.0016s
main: -- remove_index(:authentication_events, {:algorithm=>:concurrently, :name=>"index_successful_authentication_events_for_metrics"})
main: -> 0.0012s
main: == 20221125222221 AddMetricsIndexToAuthenticationEvents: reverted (0.0058s) ===
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR. -
I have confirmed that the indexes are created asynchronusly
Edited by Jessie Young