Log db host refresh thread interruption
What does this MR do and why?
There is a Thread which runs infinitely (with some sleep) which refreshes the Db LB hosts but for some reason the hosts are not getting refreshed as expected, which resulted in #364370 (closed).
As the first step, all unhandled exceptions were logged to see if there were any mishaps but nothing was captured in that.
This MR implements an alternate solution to log (when the refreshing is not taking place), more info can be found here.
cc: @stomlinson @DylanGriffith
How to set up and validate locally
Instructions for setting up Db load balancing with service discovery in local environment can be found here.
Unhappy flow:
- To replicate
refresh_thread_last_run
in past, changerefresh_thread_last_run
toTime.current - 1.hour
in here. - Observe for the error event in logs -
tail -f log/database_load_balancing.log | grep service_discovery_refresh_thread_interrupt
. -
gdk restart
(or restart onlyrails-web
) - In few seconds, we should be able to the logs coming in.
{"severity":"ERROR","time":"2023-06-09T15:36:02.097Z","correlation_id":null,"event":"service_discovery_refresh_thread_interrupt","refresh_thread_last_run":"2023-06-09T14:36:01.038Z","thread_status":"sleep"}
{"severity":"ERROR","time":"2023-06-09T15:36:02.295Z","correlation_id":null,"event":"service_discovery_refresh_thread_interrupt","refresh_thread_last_run":"2023-06-09T14:36:01.929Z","thread_status":"sleep"}
Happy flow:
- Revert the
refresh_thread_last_run
change -
gdk restart
and we should not be seeing any error logs with event 'service_discovery_refresh_thread_interrupt'.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #364370 (closed)