Fix daemon memory killer jobs hash thread safety issue
What does this MR do?
There is a bug can't add a new key into hash during iteration
reported from Sentry https://sentry.gitlab.net/gitlab/devgitlaborg/issues/1837678.
The code causes this bug is: https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/sidekiq_daemon/memory_killer.rb#L233
def rss_increase_by_jobs
Gitlab::SidekiqDaemon::Monitor.instance.jobs.sum do |job| # rubocop:disable CodeReuse/ActiveRecord
rss_increase_by_job(job)
end
end
This code jobs.sum
will enter iteration without Mutex.synchronize.
During the jobs.sum
iteration, other Sidekiq job could call Gitlab::SidekiqDaemon::Monitor.instance.within_job
(https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/sidekiq_middleware/monitor.rb#L7). This will call jobs[jid] = { worker_class: worker_class, thread: Thread.current, started_at: Gitlab::Metrics::System.monotonic_time }
. Now we have a jobs[jid]=
assignment happen during jobs
iteration, thus we get the error.
The fix is to guard this code piece jobs.sum
with job_mutex
.
Local tests have been performed #249519 (comment 413582459)
Screenshots
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Closes #249519 (closed)