Failures with `Undefined method 'load_balancer'` when unicorn is started without a working ActiveRecord connection
Seen in today's staging failover attempt: https://dev.gitlab.org/gitlab-com/migration/issues/44
When database load-balancing is enabled, we need to call Gitlab::Database::LoadBalancing.configure_proxy
. Until this is done, certain database operations will raise the following error:
NoMethodError (undefined method `load_balancer' for nil:NilClass):
ee/lib/gitlab/database/load_balancing/sticking.rb:81:in `load_balancer'
ee/lib/gitlab/database/load_balancing/sticking.rb:50:in `stick'
ee/app/services/ee/user_project_access_changed_service.rb:7:in `block in execute'
ee/app/services/ee/user_project_access_changed_service.rb:6:in `each'
ee/app/services/ee/user_project_access_changed_service.rb:6:in `execute'
app/models/member.rb:387:in `refresh_member_authorized_projects'
config/initializers/forbid_sidekiq_in_transactions.rb:49:in `block in committed!'
config/initializers/forbid_sidekiq_in_transactions.rb:11:in `skipping_transaction_check'
config/initializers/forbid_sidekiq_in_transactions.rb:49:in `committed!'
app/models/member.rb:174:in `add_user'
app/models/group.rb:161:in `add_user'
app/models/group.rb:188:in `add_owner'
app/services/groups/create_service.rb:31:in `execute'
ee/app/services/ee/groups/create_service.rb:8:in `execute'
app/controllers/groups_controller.rb:39:in `create'
lib/gitlab/i18n.rb:51:in `with_locale'
lib/gitlab/i18n.rb:57:in `with_user_locale'
app/controllers/application_controller.rb:370:in `set_locale'
ee/lib/omni_auth/strategies/group_saml.rb:27:in `other_phase'
lib/gitlab/middleware/multipart.rb:97:in `call'
lib/gitlab/request_profiler/middleware.rb:14:in `call'
This is because LoadBalancing.proxy
is only set by a call to the configure_proxy
method.
In the codebase, we only call configure_proxy
at startup in config/initializers/load_balancing.rb
, and then only if the following guard statement passes:
if ActiveRecord::Base.connected? && ActiveRecord::Base.connection.data_source_exists?('licenses')
This means that if the database is down when unicorn starts up, we don't configure the load-balancing proxy. Unicorn boots up and some queries work fine, but others reliably fail. In particular, we were seeing the above exception when trying to create a group in the UI.
As we'd fixed the database configuration issue, all we had to do was to restart unicorn - but it wasn't obvious that we needed to. It would be far better for unicorn to recover from this situation magically, or to refuse to start altogether if load-balancing is configured but the database is unavailable.
/cc @yorickpeterse
I guess this falls into ~"devops:configure" so /cc @danielgruesso also, but do let me know if that's incorrect. It's not 100% clear to me where the database load-balancing feature would fall.
Perhaps we could modify the features page to note which part of the lifecycle each feature belongs to?: https://about.gitlab.com/features/#scalability