Implement the loopless mode for the cleanup policy worker
🎱 Context
Cleanup policies are tasks executed by a background worker. We're not going to detail the task here but it's a non trivial process.
Currently, each enabled cleanup policy ready to run is read by a cron worker. This cron worker will:
- "Mark"
container repositories
- Enqueue jobs for the related limited capacity worker
The main problem is that (1.) is not scalable. The SQL query executed to get all the runnable policies is quite complex. In addition, the worker loops on each row to then mark the linked container repositories.
This is issue #267546 (closed).
The found solution is detailed in #267546 (closed).
Since we're changing the whole logic for both workers, this effort is splitted in steps and gated behind a feature flag so that we can "rollback" to the current logic at will:
- Introduce the feature flag + update the cron worker according to #267546 (closed). This is !56962 (merged).
- Prepare the limited worker for changes. This is !56989 (merged).
- (You are here
⛳ ) Update the limited worker to support the new logic
We have two sets that contain all the ContainerRepository
in need of a cleanup.
-
scheduled
: this set is for when we hit a policynext_run_at
. The linked container repositories must be cleaned up asap. This is the "high priority" set. The selection conditions for this set are:- policy
enabled
set totrue
- policy
next_run_at
in the past -
ContainerRepository
cleanup state inunscheduled
orscheduled
(for compatibility purposes with the old logic.) - these are implemented in the
ContainerRepository.with_scheduled_cleanup
scope
- policy
-
unfinished
: this is for all the leftover work from (1.). In other words, when there is no work in (1.), the backend should check this set. The conditions are:- policy
enabled
-
ContainerRepository
cleanup state inunfinished
- these are implemented in the
ContainerRepository.with_unfinished_cleanup
scope
- policy
The worker will need to mix those two sets and order the results in a way that elements of (1.) go first. That's already what we do in the old logic.
In addition, we will need to schedule the next policy run. We will do this when:
- All the cleanups
started_at
of the linked repositories are after the current policynext_run_at
.
🤔 What does this MR do?
- Updates the worker to use the above conditions/logic.
- This MR removes a previously introduced inner classes as we no longer need them
- The worker can support the feature flag by extracting the modified parts in functions. Each function will select the proper body depending on the feature flag state.
- Updates the related specs.
🖼 Screenshots (strongly suggested)
n / a
📏 Does this MR meet the acceptance criteria?
Conformity
-
📋 Does this MR need a changelog?-
I have included a changelog entry for the database migrations. -
I have not included a changelog entry for the cleanup logic changes because they are a feature flag .
-
- [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides - [-] Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team
💾 Database Review
Migration Up
$ rails db:migrate
== 20210422142647 AddProjectIdNextRunAtIndexToContainerExpirationPolicies: migrating
-- transaction_open?()
-> 0.0000s
-- index_exists?(:container_expiration_policies, [:project_id, :next_run_at], {:name=>"idx_container_exp_policies_on_project_id_next_run_at", :where=>"enabled = true", :algorithm=>:concurrently})
-> 0.0025s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- add_index(:container_expiration_policies, [:project_id, :next_run_at], {:name=>"idx_container_exp_policies_on_project_id_next_run_at", :where=>"enabled = true", :algorithm=>:concurrently})
-> 0.0044s
-- execute("RESET ALL")
-> 0.0006s
== 20210422142647 AddProjectIdNextRunAtIndexToContainerExpirationPolicies: migrated (0.0088s)
Migration Down
$ rails db:rollback
== 20210422142647 AddProjectIdNextRunAtIndexToContainerExpirationPolicies: reverting
-- transaction_open?()
-> 0.0000s
-- index_exists?(:container_expiration_policies, [:project_id, :next_run_at], {:name=>"idx_container_exp_policies_on_project_id_next_run_at", :algorithm=>:concurrently})
-> 0.0025s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- remove_index(:container_expiration_policies, {:name=>"idx_container_exp_policies_on_project_id_next_run_at", :algorithm=>:concurrently, :column=>[:project_id, :next_run_at]})
-> 0.0053s
-- execute("RESET ALL")
-> 0.0005s
== 20210422142647 AddProjectIdNextRunAtIndexToContainerExpirationPolicies: reverted (0.0096s)
Queries
- Count of scheduled cleanups: !58123 (comment 568125464)
- Count of unfinished cleanups: !58123 (comment 563870438)
- Select the next
ContainerRepository
available for cleanup. This query is not trivial so it has been split in two queries:- Select next "scheduled" cleanup: !58123 (comment 570893448)
- Select next "unfinished" cleanup: !58123 (comment 569037922)
- Check that any container repository has its cleanup
started_at
set before the given policynext_run_at
: !58123 (comment 563870462)
Note for all the explain plains, the following commands have been executed before the explain
command on pg.ai:
reset
exec CREATE INDEX idx_container_exp_policies_on_project_id_next_run_at ON container_expiration_policies USING btree (project_id, next_run_at) WHERE (enabled = true)
exec VACUUM ANALYZE container_expiration_policies