Improve logs of the Sidekiq reliable fetcher (!139932) · Merge requests · GitLab.org / GitLab

David Fernandez requested to merge 10io-add-job-class-to-reliable-fetch into master Dec 15, 2023

🔭 What does this MR do and why?

When sidekiq is shutdown (as it was explained to me here a few months ago, not sure if the situation is still the same), sometimes, we will try to re-enqueue jobs into their queues using this method.

As you can see the method will create a log entry. The problem is that, this entry contains not a lot of elements to identify the job. We do have the jid and queue but that doesn't help a lot.

Example of logs.

The problem is that it's not exactly clear which job (class) has been re enqueued. Let's say that we want to run an analysis on these logs to know which job get interrupted the most of the time. It's impossible.

This MR proposes to add the class field to the log entry, increasing the observability of this aspect.

🦄 Screenshots or screen recordings

Well, sidekiq, background jobs... Not really related to screens 😸

⚙ How to set up and validate locally

(credits to this comment)

Update config/initializers/sidekiq.rb to force a higher rate of jobs clean up

diff --git a/config/initializers/sidekiq.rb b/config/initializers/sidekiq.rb
index 8df12671f265..6a6dabf911ab 100644
--- a/config/initializers/sidekiq.rb
+++ b/config/initializers/sidekiq.rb
@@ -35,6 +35,8 @@ def enable_semi_reliable_fetch_mode?
 Sidekiq.configure_server do |config|
   config[:strict] = false
   config[:queues] = Gitlab::SidekiqConfig.expand_queues(config[:queues])
+  config[:lease_interval] = 5
+  config[:cleanup_interval] = 10

Restart the sidekiq process gdk restart rails-background-jobs
I navigated to http://localhost:3000/admin/sidekiq/cron and clicked Disable All. This will silence the logs.
Run gdk tail rails-background-jobs to tail logs.
Enqueue a job using gdk rails c, I enqueued a harmless long-running job

[1] pry(main)> Chaos::SleepWorker.perform_async(240)
=> "a382beb0da31c0e9e0e9db6b"

Use gdk redis-cli -n 1 and delete the heartbeat so that the cleanup process will think the job is dead.

redis /Users/sylvesterchin/work/gitlab-development-kit/redis/redis.socket[1]> keys *reliable-fetcher-heartbeat-*
1) "reliable-fetcher-heartbeat-SylvestersMBP2.localdomain-67195-4d63cd867499"
redis /Users/sylvesterchin/work/gitlab-development-kit/redis/redis.socket[1]> del reliable-fetcher-heartbeat-SylvestersMBP2.localdomain-67195-4d63cd867499
(integer) 1

Observe log message:

2023-12-18_02:19:30.50192 rails-background-jobs : {"severity":"INFO","time":"2023-12-18T02:19:30.501Z","message":"Cleaning working queues"}
2023-12-18_02:19:30.50598 rails-background-jobs : {"severity":"INFO","time":"2023-12-18T02:19:30.505Z","message":"Pushed job 794082f4a1847cea9b253ef1 back to queue queue:default","jid":"794082f4a1847cea9b253ef1","class":"Chaos::SleepWorker","queue":"queue:default","retry":0}

🏎 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited Dec 18, 2023 by David Fernandez

Improve logs of the Sidekiq reliable fetcher

🔭 What does this MR do and why?

🦄 Screenshots or screen recordings

⚙ How to set up and validate locally

🏎 MR acceptance checklist

Merge request reports