Skip to content

Move cleanup policies metrics to the parent worker

David Fernandez requested to merge 329196-move-cleanup-policies-metrics into master

🌲 Context

Cleanup policies are executed by background jobs. To keep an eye on the work load, we have some metrics put in the log.

Those metrics are used to power the Kibana dashboard we have for cleanup policies: https://log.gprd.gitlab.net/goto/4be32f552be4011b6df4403a829f80a5 (internal)

In !58123 (merged), we refactored how the work is organized for cleanup policies. Unfortunately, we had to tone down the accuracy of the metrics in order to have a speedy query for those.

With #330315 (closed), we improved the underlying database index by removing referenced policies that are actually not interesting for background jobs.

Those improvements allow us to bring back precise cleanup policies metrics. To ease the load on the database, we decided to move those metrics from a background job that runs pretty much all the time to a background job that is a cron one: it runs only each 50 minutes. That's good enough to dump the metrics in the logs. This is issue #329196 (closed).

🤔 What does this MR do?

  • Removes cleanup policies metrics logging from ContainerExpirationPolicies::CleanupContainerRepositoryWorker.
    • This is the job running all the time.
  • Adds cleanup policies metrics logging to ContainerExpirationPolicyWorker.
    • This job runs every 50 minutes.
  • Update the related specs

There is no documentation change as this is internal and we merely change how we log metrics.

🖼 Screenshots (strongly suggested)

n / a

📏 Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team

💾 Database review

The metrics are basically two counts. One of them is a bit borderline with the ~performance guideline but keep in mind that these counts will run now only once per 50min instead of all the time.

Migration up

$ rails db:migrate
== 20210705130919 CreateContainerReposOnExpCleanupStatusProjectIdStartDateIndex: migrating 
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:container_repositories, [:expiration_policy_cleanup_status, :project_id, :expiration_policy_started_at], {:name=>"idx_container_repos_on_exp_cleanup_status_project_id_start_date", :algorithm=>:concurrently})
   -> 0.0048s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- add_index(:container_repositories, [:expiration_policy_cleanup_status, :project_id, :expiration_policy_started_at], {:name=>"idx_container_repos_on_exp_cleanup_status_project_id_start_date", :algorithm=>:concurrently})
   -> 0.0037s
-- execute("RESET ALL")
   -> 0.0005s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:container_repositories, [:expiration_policy_cleanup_status, :expiration_policy_started_at], {:name=>"idx_container_repositories_on_exp_cleanup_status_and_start_date", :algorithm=>:concurrently})
   -> 0.0027s
-- remove_index(:container_repositories, {:name=>"idx_container_repositories_on_exp_cleanup_status_and_start_date", :algorithm=>:concurrently, :column=>[:expiration_policy_cleanup_status, :expiration_policy_started_at]})
   -> 0.0048s
== 20210705130919 CreateContainerReposOnExpCleanupStatusProjectIdStartDateIndex: migrated (0.0188s)

Migration down

$ rails db:rollback
== 20210705130919 CreateContainerReposOnExpCleanupStatusProjectIdStartDateIndex: reverting 
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:container_repositories, [:expiration_policy_cleanup_status, :expiration_policy_started_at], {:name=>"idx_container_repositories_on_exp_cleanup_status_and_start_date", :algorithm=>:concurrently})
   -> 0.0035s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- add_index(:container_repositories, [:expiration_policy_cleanup_status, :expiration_policy_started_at], {:name=>"idx_container_repositories_on_exp_cleanup_status_and_start_date", :algorithm=>:concurrently})
   -> 0.0035s
-- execute("RESET ALL")
   -> 0.0005s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:container_repositories, [:expiration_policy_cleanup_status, :project_id, :expiration_policy_started_at], {:name=>"idx_container_repos_on_exp_cleanup_status_project_id_start_date", :algorithm=>:concurrently})
   -> 0.0023s
-- remove_index(:container_repositories, {:name=>"idx_container_repos_on_exp_cleanup_status_project_id_start_date", :algorithm=>:concurrently, :column=>[:expiration_policy_cleanup_status, :project_id, :expiration_policy_started_at]})
   -> 0.0045s
== 20210705130919 CreateContainerReposOnExpCleanupStatusProjectIdStartDateIndex: reverted (0.0164s) 
Edited by David Fernandez

Merge request reports

Loading