Skip to content

Avoid database reconnections when host disconnected from load balancer

Stan Hu requested to merge sh-pool-disconnect-without-verify into master

What does this MR do and why?

As described in #490211 (closed), in Rails 7.0, whenever ConnectionPool#disconnect! is called, each connection in the @available queue is acquired by the thread and verified with a SQL ; query. If the verification fails, then Rails will attempt a reconnect for all those connections in the pool. This reconnection can cause unnecessary database connection saturation and result in a flood of SET statements on a PostgreSQL host when many threads attempt the same thing.

Rails 7.1 has fixed this in https://github.com/rails/rails/pull/44576, but until we upgrade this patch disables this verification step.

This commit introduces a load_balancing_disconnect_without_verify feature flag to enable this change.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

I used an Omnibus installation with a PostgreSQL primary and secondary, each behind a local PgBouncer instance to confirm that disabling verify! reduces SQL reconnection attempts--and SET queries--significantly: gitlab-com/gl-infra/production#18565 (comment 2113924475). However, you should be able to validate that this feature flag still functions properly:

  1. Set up a GDK with a secondary PostgreSQL with database load balancing: https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/database_load_balancing.md
  2. Check out this branch and enable the feature flag via bin/rails console: Feature.enable(:load_balancing_disconnect_without_verify)
  3. gdk restart rails
  4. gdk restart rails-background-jobs
  5. Take down the secondary: gdk stop postgresql-replica.
  6. Check that gitlab/log/database_load_balancing.log marks the host offline:
{"severity":"WARN","time":"2024-09-17T05:24:16.485Z","correlation_id":null,"event":"host_offline","message":"Host is offline after replica status check","db_host":"10.128.15.243","db_port":null}
  1. Use the GDK, and verify it still functions.
  2. Bring back the secondary via gdk start postgresql-replica
{"severity":"INFO","time":"2024-09-17T05:25:54.473Z","correlation_id":"01J7Z73H44RV6GJPZGT8ECT2HR","event":"host_online","message":"Host is online after replica status check","db_host":"10.128.15.243","db_port":null}
Edited by Stan Hu

Merge request reports

Loading