Improve graceful disconnection handling for LoadBalancing::Host
Extracted from !130432 (comment 1536438094)
In !130432 (merged) we reduced the amount of time we spend waiting to gracefully disconnect hosts before force disconnecting. The idea of "graceful" disconnection, however, is flawed in 2 ways. It looks like:
def try_disconnect
if pool.connections.none?(&:in_use?)
pool.disconnect!
return true
end
The first problem here is a threading race condition where connections can become "in use" before we trigger disconnect!
. But the second flaw is that so long as the pool is healthy and the application is receiving steady traffic it's very likely there will never be pool.connections.none?(&:in_use?)
because the application will constantly be checking out and in connections from the pool for traffic.
A better way to handle both of the above problems is to mark a pool or LoadBalancing::Host
as @disconnecting
first and then ensure that our LoadBalancing::LoadBalancer
never picks a Host
that is @disconnecting
. Then we can fix the race condition because we know that once we have no connections in use then we won't get any more. Plus we can fix the main problem that we never actually get to 0 connections in use. This might get rid of the cases where graceful disconnects timeout and could substantially reduce the time it takes to handle disconnections in ServiceDiscovery
.