Geo::MetricsUpdateWorker slow total job artifacts count
Summary
A customer on GitLab 14.9 is having performance problems which they narrowed down to Geo::MetricsUpdateWorker
running the query to count total number of job artifacts.
select count(*) from ci_job_artifacts
Steps to reproduce
Example Project
What is the current bug behavior?
What is the expected correct behavior?
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
Workarounds
Disable Geo metrics update in the short-term
Warning: This workaround is short-term. If a failover is performed, geo_sidekiq_cron_config_worker
needs to be enabled
in order for a Geo site to behave properly. If geo_metrics_update_worker
is disabled, Geo metrics will then not get updated. This can be triggered manually.
Since the problem is on the primary site, or on a secondary site with a different external URL from the primary's external URL, then you can use the UI for this workaround:
- Visit the site directly and browse to
Admin > Background Jobs
. - Click the
Cron
tab - Find
geo_sidekiq_cron_config_worker
and clickDisable
- Find
geo_metrics_update_worker
and clickDisable
If you are curious how to disable this worker on a secondary
I'm collapsing this info because it's not relevant to this specific issue. The secondary worker counts registry rows, it does not count the ci_job_artifacts
table. But I'm keeping this around because it may be useful in the future for troubleshooting.
If the problem is on a secondary site with same external URL as the primary, or if secondary proxying is enabled with separate URLs, then SSH into a GitLab Rails node in the affected site and use the commands below:
To disable Geo metrics worker in CLI
sudo gitlab-rails runner 'cron_manager = Sidekiq::Cron::Job.find("geo_sidekiq_cron_config_worker"); geo_metrics_update = Sidekiq::Cron::Job.find("geo_metrics_update_worker"); cron_manager.disable!; geo_metrics_update.disable!'
To check the enabled/disabled status of Geo metrics worker (and cron manager) in CLI
sudo gitlab-rails runner 'cron_manager = Sidekiq::Cron::Job.find("geo_sidekiq_cron_config_worker"); geo_metrics_update = Sidekiq::Cron::Job.find("geo_metrics_update_worker"); puts "geo_sidekiq_cron_config_worker is #{cron_manager.status}"; puts "geo_metrics_update_worker is #{geo_metrics_update.status}"'
To trigger an update of Geo metrics right now in CLI
sudo gitlab-rails runner 'Sidekiq::Cron::Job.find("geo_metrics_update_worker").enque!'
To reenable Geo metrics worker (and cron manager) in CLI
sudo gitlab-rails runner 'cron_manager = Sidekiq::Cron::Job.find("geo_sidekiq_cron_config_worker"); geo_metrics_update = Sidekiq::Cron::Job.find("geo_metrics_update_worker"); cron_manager.enable!; geo_metrics_update.enable!'