Automate deactivation of dormant users
What does this MR do?
This MR adds new cron worker to automate deactivation of dormant users.
This new worker runs daily, and if new application setting deactivate_dormant_users
is enabled deactivates dormant users - those with no activity at all or with no activity in the last 90 days.
As per the existing code, deactivated users will be activated on their next sign in.
There is no UI for the new application setting yet.
The intention is that this will never be enabled for SaaS.
Related to #211754 (closed).
What is a dormant user?
Dormant users are users that have no recent activity and are considered OK to be deactivated. Currently we define this in User#can_be_deactivated?
:
def can_be_deactivated?
active? && no_recent_activity? && !internal?
end
If we break this down
-
active?
- can be translated towith_state(:active)
-
!internal?
- this is!(ghost? || (bot? && !project_bot?))
, which can be expressed in SQL with theUser.non_internal
scope - https://gitlab.com/gitlab-org/gitlab/-/blob/8a31f66b5633ec593941e39364425edf13e6592b/app/models/concerns/has_user_type.rb#L27.
Both these are already combined in the User.active
scope (https://gitlab.com/gitlab-org/gitlab/-/blob/8a31f66b5633ec593941e39364425edf13e6592b/app/models/user.rb#L370):
scope :active, -> { with_state(:active).non_internal }
User#no_recent_activity?
is a bit trickier. Currently it is defined as:
def no_recent_activity?
last_active_at.to_i <= MINIMUM_INACTIVE_DAYS.days.ago.to_i
end
where last_active_at
is not a database column, but another method on User
(https://gitlab.com/gitlab-org/gitlab/-/blob/8a31f66b5633ec593941e39364425edf13e6592b/app/models/user.rb#L1840-1845):
def last_active_at
last_activity = last_activity_on&.to_time&.in_time_zone
last_sign_in = current_sign_in_at
[last_activity, last_sign_in].compact.max
end
If we want to express this in SQL it gets a bit complicated to make it in performant way - first we need to take care for each of last_activity_on
and current_sign_in_at
having a value or being NULL, and then we have to compare against the one having bigger value (i.e. being more recent). Unfortunately we can't just create a index on GREATEST(last_activity_on, current_sign_in_at)
because these two columns are of different type - date
and timestamp with timezone
.
However, it turns out that taking into account current_sign_in_at
is not needed - whenever users signs in (which updates current_sign_in_at
), last_activity_on
is also updated (because the sign-in request is POST
).
Example:
Let's look at an user without sign-in:
gitlabhq_development=# select id, username, state, user_type, last_activity_on, current_sign_in_at, updated_at from users where id = 19;
id | username | state | user_type | last_activity_on | current_sign_in_at | updated_at
----+----------+--------+-----------+------------------+--------------------+----------------------------
19 | ressie | active | | | | 2021-04-06 04:26:50.980647
(1 row)
Time: 5.785 ms
This is what we see in the logs for the sign-in request:
and if we check the database record for this user after sign-in we see
gitlabhq_development=# select id, username, state, user_type, last_activity_on, current_sign_in_at, updated_at from users where id = 19;
id | username | state | user_type | last_activity_on | current_sign_in_at | updated_at
----+----------+--------+-----------+------------------+----------------------------+----------------------------
19 | ressie | active | | 2021-04-06 | 2021-04-06 04:31:35.345049 | 2021-04-06 04:31:36.647626
(1 row)
Time: 0.837 ms
All of the above means that we can ignore current_sign_in_at
and express no recent activity as where('last_activity_on <= ?', MINIMUM_INACTIVE_DAYS.day.ago.to_date)
.
Once we combine all of the above we end up with scope like
scope :dormant, -> { active.where('last_activity_on <= ?', MINIMUM_INACTIVE_DAYS.day.ago.to_date) }
and this can be supported very well with partial index on users (id, last_activity_on) WHERE (((state)::text = 'active'::text) AND ((user_type IS NULL) OR (user_type = ANY (ARRAY[NULL::integer, 6, 4]))))
.
In order to match the current definition of no_recent_activity?
and cover users with no activity at all we should do something like
last_activity_on <= ?
OR last_activity_on IS NULL
As we know OR
queries do not perform well - https://docs.gitlab.com/ee/development/sql.html#use-unions. The recommended approach is to use UNION
.
This way we end up with query like
UPDATE "users"
SET "state" = 'deactivated'
WHERE "users"."id" IN (
(SELECT "users"."id" FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (NULL, 6, 4)) AND (last_activity_on <= '2021-01-21') ORDER BY users.id ASC LIMIT 200)
UNION
(SELECT "users"."id" FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (NULL, 6, 4)) AND "users"."last_activity_on" IS NULL ORDER BY users.id ASC LIMIT 200)
LIMIT 200
)
Worker is executed twice every hour from 0 through 4 (21,42 0-4 * * *
) This locks the maximum number of users to be deactivated per day to 100_000 (50 * 200 * 2 * 5
).
Database Migrations
Up
$ bundle exec rails db:migrate:up VERSION=20210421021510
== 20210421021510 AddDeactivateDormantUsersToApplicationSettings: migrating ===
-- add_column(:application_settings, :deactivate_dormant_users, :boolean, {:default=>false, :null=>false})
-> 0.0054s
== 20210421021510 AddDeactivateDormantUsersToApplicationSettings: migrated (0.0054s)
$ bundle exec rails db:migrate:up VERSION=20210421022010
== 20210421022010 AddIndexForDormantUsers: migrating ==========================
-- transaction_open?()
-> 0.0000s
-- index_exists?(:users, [:id, :last_activity_on], {:where=>"state = 'active' AND (users.user_type IS NULL OR users.user_type IN (NULL, 6, 4))", :name=>"index_users_on_id_and_last_activity_on_for_non_internal_active", :algorithm=>:concurrently})
-> 0.0108s
-- execute("SET statement_timeout TO 0")
-> 0.0006s
-- add_index(:users, [:id, :last_activity_on], {:where=>"state = 'active' AND (users.user_type IS NULL OR users.user_type IN (NULL, 6, 4))", :name=>"index_users_on_id_and_last_activity_on_for_non_internal_active", :algorithm=>:concurrently})
-> 0.0225s
-- execute("RESET ALL")
-> 0.0009s
== 20210421022010 AddIndexForDormantUsers: migrated (0.0360s) =================
Down
$ bundle exec rails db:migrate:down VERSION=20210421022010
== 20210421022010 AddIndexForDormantUsers: reverting ==========================
-- transaction_open?()
-> 0.0000s
-- indexes(:users)
-> 0.0121s
-- execute("SET statement_timeout TO 0")
-> 0.0006s
-- remove_index(:users, {:algorithm=>:concurrently, :name=>"index_users_on_id_and_last_activity_on_for_non_internal_active"})
-> 0.0062s
-- execute("RESET ALL")
-> 0.0006s
== 20210421022010 AddIndexForDormantUsers: reverted (0.0208s) =================
$ bundle exec rails db:migrate:down VERSION=20210421021510
== 20210421021510 AddDeactivateDormantUsersToApplicationSettings: reverting ===
-- remove_column(:application_settings, :deactivate_dormant_users, :boolean, {:default=>false, :null=>false})
-> 0.0045s
== 20210421021510 AddDeactivateDormantUsersToApplicationSettings: reverted (0.0066s)
Index creation takes ~249s on when tested on postgres.ai:
exec CREATE INDEX CONCURRENTLY index_users_on_id_and_last_activity_on_for_non_internal_active ON users USING btree (id, last_activity_on) WHERE (((state)::text = 'active'::text) AND ((user_type IS NULL) OR (user_type = ANY (ARRAY[NULL::integer, 6, 4]))))
Session: webui-i3232
% time seconds wait_event
------ ------------ -----------------------------
90.77 226.406605 IO.DataFileRead
6.19 15.434963 Running
1.09 2.709451 IO.DataFileExtend
0.90 2.248166 LWLock.WALWriteLock
0.55 1.371173 IO.BufFileWrite
0.19 0.466658 IO.DataFileWrite
0.09 0.223423 IO.SLRURead
0.08 0.211910 IO.WALWrite
0.08 0.194466 IO.BufFileRead
0.05 0.135070 IPC.ParallelCreateIndexScan
0.01 0.024559 LWLock.buffer_mapping
------ ------------ -----------------------------
100.00 249.426444
The query has been executed. Duration: 249.426 s (estimated for prod: 22.169...228.373 s)
SQL queries
Here is the raw SQL query generated with its execution plans:
UPDATE "users"
SET "state" = 'deactivated'
WHERE "users"."id" IN (
(SELECT "users"."id" FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (NULL, 6, 4)) AND (last_activity_on <= '2021-01-21') ORDER BY users.id ASC LIMIT 200)
UNION
(SELECT "users"."id" FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (NULL, 6, 4)) AND "users"."last_activity_on" IS NULL ORDER BY users.id ASC LIMIT 200)
LIMIT 200
)
Does this MR meet the acceptance criteria?
Conformity
-
📋 Does this MR need a changelog?-
I have included a changelog entry. - [-] I have not included a changelog entry because _____.
-
- [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides - [-] Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. - [-] Tested in all supported browsers
- [-] Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team