Avoid database reconnections when host disconnected from load balancer
What does this MR do and why?
As described in #490211 (closed),
in Rails 7.0, whenever ConnectionPool#disconnect!
is called, each
connection in the @available
queue is acquired by the thread and verified with a SQL ;
query. If the verification fails, then Rails will attempt a reconnect
for all those connections in the pool. This reconnection can cause
unnecessary database connection saturation and result in a flood of
SET statements on a PostgreSQL host when many threads attempt the same
thing.
Rails 7.1 has fixed this in https://github.com/rails/rails/pull/44576, but until we upgrade this patch disables this verification step.
This commit introduces a load_balancing_disconnect_without_verify
feature flag to enable this change.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
I used an Omnibus installation with a PostgreSQL primary and secondary, each behind a local PgBouncer instance to confirm that disabling verify!
reduces SQL reconnection attempts--and SET queries--significantly: gitlab-com/gl-infra/production#18565 (comment 2113924475). However, you should be able to validate that this feature flag still functions properly:
- Set up a GDK with a secondary PostgreSQL with database load balancing: https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/database_load_balancing.md
- Check out this branch and enable the feature flag via
bin/rails console
:Feature.enable(:load_balancing_disconnect_without_verify)
gdk restart rails
gdk restart rails-background-jobs
- Take down the secondary:
gdk stop postgresql-replica
. - Check that
gitlab/log/database_load_balancing.log
marks the host offline:
{"severity":"WARN","time":"2024-09-17T05:24:16.485Z","correlation_id":null,"event":"host_offline","message":"Host is offline after replica status check","db_host":"10.128.15.243","db_port":null}
- Use the GDK, and verify it still functions.
- Bring back the secondary via
gdk start postgresql-replica
{"severity":"INFO","time":"2024-09-17T05:25:54.473Z","correlation_id":"01J7Z73H44RV6GJPZGT8ECT2HR","event":"host_online","message":"Host is online after replica status check","db_host":"10.128.15.243","db_port":null}