Do not checkout connection from pool in DatabaseSampler
What does this MR do and why?
As part of debugging #423945 (comment 1544793516) we were looking into why Rails is constantly timing out when trying to close connections to the database. It keeps trying to close a whole pool of connections until no connections are in use in the pool. We had an arbitrary timeout waiting for the pool to empty but it would always timeout (even in local testing).
The reason we sometimes need to disconnect to the datbase is that we have periodic service discovery looking for the current list of replica databases. Every time that changes in Rails we need to disconnect any previous list of replicas.
In the end we tracked down this DatabaseSampler
which is started in a
Thread when the application boots up. This thread at some point asks
for a connection
which checks it out of the connection pool. But it
never checks it back in so after startup this connection will forever be
wasted by this Thread. It turns out that this DatabaseSampler
never
really needed a connection from the pool for the specific things it was
doing so I refactored the code to just avoid calling .connection
and
it seems to fix this offence.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
- Follow setup instructions in https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/database_load_balancing_with_service_discovery.md
- You can spot that this connection is being checked out in many ways. I edited the code in ActiveRecord
::ActiveRecord::ConnectionAdapters::ConnectionPool#connection
to write the backtrace when it was happening so I could see where it's called from:def connection if Thread.current.inspect.include?('database_sampler') || Thread.current.inspect.include?('background_task') File.open('log/broken.log', 'a') do |f| f.puts("______________________________________") f.puts(caller.join("\n")) f.puts("======================================") end end @thread_cached_conns[connection_cache_key(current_thread)] ||= checkout end
- More debugging stuff is in #423945 (comment 1544792096) where I confirmed that this was causing the disconnect timeout
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #423945