Fix timeouts on when informing users of expiring SSH keys
What does this MR do?
In this MR we sped up the query to inform users that their SSH keys had expired. Whilst this improvement was enough to cause the query to run successfully on staging.gitlab.com, it still timed out in production.
Database Review
Why keyset pagination
In the worker, we want to iterate over a date range (expired_at
) in batches. Unfortunately EachBatch
cannot iterate over timestamp columns, it needs to use a distinct column (id
).
When EachBatch
is used, the first query will try to find the minimum id
value (EachBatch
processes the records in id ASC
order). This makes the database use the primary key index:
SELECT min("keys"."id") FROM "keys" WHERE (date(expires_at AT TIME ZONE 'UTC') BETWEEN '2020-01-01' AND CURRENT_DATE AND expiry_notification_delivered_at IS NULL) ;
Result (cost=47.23..47.24 rows=1 width=4)
InitPlan 1 (returns $0)
-> Limit (cost=0.43..47.23 rows=1 width=4)
-> Index Scan using keys_pkey on keys (cost=0.43..1063333.59 rows=22722 width=4)
Index Cond: (id IS NOT NULL)
Filter: ((expiry_notification_delivered_at IS NULL) AND (date(timezone('UTC'::text, expires_at)) >= '2020-01-01'::date) AND (date(timezone('UTC'::text, expires_at)) <= CURRENT_DATE))
(6 rows)
For large database tables, this query will time out.
Keyset pagination has similar performance characteristics as EachBatch
and it can process the rows in arbitrary order. In this MR we paginate over the records ordered by date(expires_at AT TIME ZONE 'UTC') ASC, "keys"."id" ASC
.
The other option would be to build a loop using kaminari to use OFFSET pagination and order by date(expires_at AT TIME ZONE 'UTC') ASC
however, in the long term we might see performance problems. See the docs for more info: https://docs.gitlab.com/ee/development/database/pagination_guidelines.html#offset-on-a-large-dataset
Migration
Up
== 20210609090856 AddExpiryIdSshKeyNotificationIndex: migrating ===============
-- transaction_open?()
-> 0.0000s
-- index_exists?(:keys, "date(timezone('UTC', expires_at)), id", {:where=>"expiry_notification_delivered_at IS NULL", :name=>"index_keys_on_expires_at_and_id", :algorithm=>:concurrently})
-> 0.0062s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- add_index(:keys, "date(timezone('UTC', expires_at)), id", {:where=>"expiry_notification_delivered_at IS NULL", :name=>"index_keys_on_expires_at_and_id", :algorithm=>:concurrently})
-> 0.0152s
-- execute("RESET ALL")
-> 0.0006s
== 20210609090856 AddExpiryIdSshKeyNotificationIndex: migrated (0.0234s) ======
Down
== 20210609090856 AddExpiryIdSshKeyNotificationIndex: reverting ===============
-- transaction_open?()
-> 0.0000s
-- indexes(:keys)
-> 0.0045s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- remove_index(:keys, {:algorithm=>:concurrently, :name=>"index_keys_on_expires_at_and_id"})
-> 0.0050s
-- execute("RESET ALL")
-> 0.0005s
== 20210609090856 AddExpiryIdSshKeyNotificationIndex: reverted (0.0113s) ======
Query
- Plan 1: https://explain.depesz.com/s/aRhm
- Plan 2: https://explain.depesz.com/s/UuaMc
Does this MR meet the acceptance criteria?
Conformity
-
I have included a changelog entry, or it's not needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Related to #333080 (closed)