Rescue from and log errors in metrics exporters
What does this MR do and why?
In #349592 (closed) we discussed that the removal of rescuing from and logging port allocation conflicts when starting an in-process metrics server in Sidekiq makes it difficult to understand what happened, should it occur.
We eventually decided that the problem is two-fold:
- The metrics server should not crash sidekiq when coming up
- Logging is handled in a somewhat confusing fashion
Ultimately, we are looking to completely isolate these failures in &6409 (closed) and &7304 (closed). In this MR I am trying to make some progress in the right direction by:
- Rescuing from server start exceptions and logging them to the metrics-server log
- Refactor the handling of logs by moving its responsibility into the
BaseExporter
superclass. Interestingly this reducesSidekiqExporter
to merely an initializer, and I therefore removed its spec.
There are still some odd inconsistencies, for "historic reasons". For instance, due to an incident in the past, we added the ability to disable sidekiq exporter logs, and default them to off, but we did not do this for web_exporter (running in Puma), which means we need to jump through a few hoops to streamline logging logic.
Screenshots or screen recordings
Errors during start-up such as port conflicts are properly logged again:
$ docker exec -it gl-gck_sidekiq_1 bash
$ git@04470835775b:~/gitlab$ tail -fn 100 log/sidekiq_exporter.log
...
[2022-01-06T14:02:34.643+0000] INFO WEBrick 1.6.1
[2022-01-06T14:02:34.643+0000] INFO ruby 2.7.4 (2021-07-07) [x86_64-linux]
[2022-01-06T14:02:34.643+0000] ERROR Errno::EADDRINUSE: Address already in use - bind(2) for 0.0.0.0:3807
/usr/local/lib/ruby/2.7.0/socket.rb:201:in `bind'
/usr/local/lib/ruby/2.7.0/socket.rb:201:in `listen'
/usr/local/lib/ruby/2.7.0/socket.rb:765:in `block in tcp_server_sockets'
/usr/local/lib/ruby/2.7.0/socket.rb:227:in `each'
/usr/local/lib/ruby/2.7.0/socket.rb:227:in `foreach'
/usr/local/lib/ruby/2.7.0/socket.rb:763:in `tcp_server_sockets'
/usr/local/lib/ruby/2.7.0/webrick/utils.rb:65:in `create_listeners'
/usr/local/lib/ruby/2.7.0/webrick/server.rb:127:in `listen'
/usr/local/lib/ruby/2.7.0/webrick/server.rb:108:in `initialize'
/usr/local/lib/ruby/2.7.0/webrick/httpserver.rb:47:in `initialize'
/home/git/gitlab/lib/gitlab/metrics/exporter/base_exporter.rb:38:in `new'
/home/git/gitlab/lib/gitlab/metrics/exporter/base_exporter.rb:38:in `start_working'
/home/git/gitlab/lib/gitlab/daemon.rb:42:in `block in start'
/home/git/gitlab/lib/gitlab/daemon.rb:39:in `synchronize'
/home/git/gitlab/lib/gitlab/daemon.rb:39:in `start'
/home/git/gitlab/metrics_server/metrics_server.rb:59:in `start'
/home/git/gitlab/metrics_server/metrics_server.rb:25:in `spawn'
bin/metrics-server:12:in `<main>'
Plus, this doesn't crash the host process (sidekiq/puma worker) anymore
How to set up and validate locally
- Make something bind to the metrics server port, or find a different way to crash it
- Make sure the sidekiq metrics server log is enabled: https://docs.gitlab.com/ee/administration/logs.html#sidekiq_exporterlog-and-web_exporterlog
- Run sidekiq
- Sidekiq shouldn't crash, but
log/sidekiq_exporter.log
should exist and contain the stack trace
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #349592 (closed), #347205