Move cleanup policies metrics to the parent worker
🌲 Context
Cleanup policies are executed by background jobs. To keep an eye on the work load, we have some metrics put in the log.
Those metrics are used to power the Kibana dashboard we have for cleanup policies: https://log.gprd.gitlab.net/goto/4be32f552be4011b6df4403a829f80a5 (internal)
In !58123 (merged), we refactored how the work is organized for cleanup policies. Unfortunately, we had to tone down the accuracy of the metrics in order to have a speedy query for those.
With #330315 (closed), we improved the underlying database index by removing referenced policies that are actually not interesting for background jobs.
Those improvements allow us to bring back precise cleanup policies metrics. To ease the load on the database, we decided to move those metrics from a background job that runs pretty much all the time to a background job that is a cron one: it runs only each 50 minutes. That's good enough to dump the metrics in the logs. This is issue #329196 (closed).
🤔 What does this MR do?
- Removes cleanup policies metrics logging from
ContainerExpirationPolicies::CleanupContainerRepositoryWorker
.- This is the job running all the time.
- Adds cleanup policies metrics logging to
ContainerExpirationPolicyWorker
.- This job runs every 50 minutes.
- Update the related specs
There is no documentation change as this is internal and we merely change how we log metrics.
🖼 Screenshots (strongly suggested)
n / a
📏 Does this MR meet the acceptance criteria?
Conformity
-
I have included a changelog entry, or it's not needed. (Does this MR need a changelog?) - [-] I have added/updated documentation, or it's not needed. (Is documentation required?)
-
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
- [-] I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.)
- [-] I have tested this MR in all supported browsers, or it's not needed.
- [-] I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
- [-] Label as security and @ mention
@gitlab-com/gl-security/appsec
- [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- [-] Security reports checked/validated by a reviewer from the AppSec team
💾 Database review
The metrics are basically two counts. One of them is a bit borderline with the ~performance guideline but keep in mind that these counts will run now only once per 50min instead of all the time.
- Requiring cleanup count: !63635 (comment 618908003)
- Unfinished cleanup count: !63635 (comment 618908001)
Migration up
$ rails db:migrate
== 20210705130919 CreateContainerReposOnExpCleanupStatusProjectIdStartDateIndex: migrating
-- transaction_open?()
-> 0.0000s
-- index_exists?(:container_repositories, [:expiration_policy_cleanup_status, :project_id, :expiration_policy_started_at], {:name=>"idx_container_repos_on_exp_cleanup_status_project_id_start_date", :algorithm=>:concurrently})
-> 0.0048s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- add_index(:container_repositories, [:expiration_policy_cleanup_status, :project_id, :expiration_policy_started_at], {:name=>"idx_container_repos_on_exp_cleanup_status_project_id_start_date", :algorithm=>:concurrently})
-> 0.0037s
-- execute("RESET ALL")
-> 0.0005s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:container_repositories, [:expiration_policy_cleanup_status, :expiration_policy_started_at], {:name=>"idx_container_repositories_on_exp_cleanup_status_and_start_date", :algorithm=>:concurrently})
-> 0.0027s
-- remove_index(:container_repositories, {:name=>"idx_container_repositories_on_exp_cleanup_status_and_start_date", :algorithm=>:concurrently, :column=>[:expiration_policy_cleanup_status, :expiration_policy_started_at]})
-> 0.0048s
== 20210705130919 CreateContainerReposOnExpCleanupStatusProjectIdStartDateIndex: migrated (0.0188s)
Migration down
$ rails db:rollback
== 20210705130919 CreateContainerReposOnExpCleanupStatusProjectIdStartDateIndex: reverting
-- transaction_open?()
-> 0.0000s
-- index_exists?(:container_repositories, [:expiration_policy_cleanup_status, :expiration_policy_started_at], {:name=>"idx_container_repositories_on_exp_cleanup_status_and_start_date", :algorithm=>:concurrently})
-> 0.0035s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- add_index(:container_repositories, [:expiration_policy_cleanup_status, :expiration_policy_started_at], {:name=>"idx_container_repositories_on_exp_cleanup_status_and_start_date", :algorithm=>:concurrently})
-> 0.0035s
-- execute("RESET ALL")
-> 0.0005s
-- transaction_open?()
-> 0.0000s
-- index_exists?(:container_repositories, [:expiration_policy_cleanup_status, :project_id, :expiration_policy_started_at], {:name=>"idx_container_repos_on_exp_cleanup_status_project_id_start_date", :algorithm=>:concurrently})
-> 0.0023s
-- remove_index(:container_repositories, {:name=>"idx_container_repos_on_exp_cleanup_status_project_id_start_date", :algorithm=>:concurrently, :column=>[:expiration_policy_cleanup_status, :project_id, :expiration_policy_started_at]})
-> 0.0045s
== 20210705130919 CreateContainerReposOnExpCleanupStatusProjectIdStartDateIndex: reverted (0.0164s)