Add timeout options for configure_postgresql of pg-upgrade
What does this MR do?
Problem
Related JH MR https://jihulab.com/gitlab-cn/gitlab/-/issues/686
When I upgrade the node for patroni replica,It failed. after investigation, it is found that when the upgrade command
pg-upgrade
is executed, the replica database node will usepg_basebackup
command to pull the basic backup again from the leader node. Since Gitlab did not detect the available status of the running database within the specified time, it was interrupted due to timeout. Thepg_basebackup
command was immediately interrupted,so I modified the settingspostgresql['max_service_checks'] = 20
andpostgresql['service_check_interval'] = 60
, the settings not ingitlab.rb
default, now command timeout from 3 minutes to 10 minutes, but 10 minutes is not enough.I do not know which parameter ingitlab.rb
can change , from source code I found 600s is the limit of running command, 10 minutes is not enough for a database of 100GB+
Command output error logs:
================================================================================
Error executing action `run` on resource 'ruby_block[wait for postgresql to start]'
================================================================================
RuntimeError
------------
PostgreSQL did not respond before service checks were exhausted
Cookbook Trace:
---------------
/opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/libraries/helpers/pg_status_helper.rb:56:in `ready?'
/opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/libraries/helpers/base_pg_helper.rb:28:in `is_ready?'
/opt/gitlab/embedded/cookbooks/cache/cookbooks/patroni/recipes/enable.rb:93:in `block (2 levels) in from_file'
Resource Declaration:
---------------------
# In /opt/gitlab/embedded/cookbooks/cache/cookbooks/patroni/recipes/enable.rb
92: ruby_block 'wait for postgresql to start' do
93: block { pg_helper.is_ready? }
94: only_if { omnibus_helper.should_notify?(patroni_helper.service_name) }
95: end
96:
Compiled Resource:
------------------
# Declared in /opt/gitlab/embedded/cookbooks/cache/cookbooks/patroni/recipes/enable.rb:92:in `from_file'
ruby_block("wait for postgresql to start") do
action [:run]
default_guard_interpreter :default
declared_type :ruby_block
cookbook_name "patroni"
recipe_name "enable"
block #<Proc:0x00000000044db008 /opt/gitlab/embedded/cookbooks/cache/cookbooks/patroni/recipes/enable.rb:93>
block_name "wait for postgresql to start"
only_if { #code block }
end
System Info:
------------
chef_version=15.14.0
platform=centos
platform_version=7.9.2009
ruby=ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
program_name=/opt/gitlab/embedded/bin/chef-client
executable=/opt/gitlab/embedded/bin/chef-client
Running handlers:
Running handlers complete
Chef Infra Client failed. 4 resources updated in 01 minutes 39 seconds
===STDERR===
There was an error running gitlab-ctl reconfigure:
ruby_block[wait for postgresql to start] (patroni::enable line 92) had an error: RuntimeError: PostgreSQL did not respond before service checks were exhausted
======
== Fatal error ==
Error updating PostgreSQL configuration. Please check the output
== Reverting ==
ok: down: patroni: 1s, normally up
Symlink correct version of binaries: OK
ok: run: patroni: (pid 23741) 1s
== Reverted ==
== Reverted to 11.11. Please check output for what went wrong ==
Solution
Check the log, I found that it's the configure_postgresql
method of files/gitlab-ctl-commands/pg-upgrade.rb
that throws the exception.
Seems there has no timeout options in method configure_postgresql
of files/gitlab-ctl-commands/pg-upgrade.rb
. So I added a timeout option when calling GitlabCtl::Util.chef_run
in configure_postgresql
.
/cc @prajnamas
Related issues
Checklist
See Definition of done.
For anything in this list which will not be completed, please provide a reason in the MR discussion
Required
-
Merge Request Title, and Description are up to date, accurate, and descriptive -
MR targeting the appropriate branch -
MR has a green pipeline on GitLab.com -
Pipeline is green on dev.gitlab.org if the change is touching anything besides documentation or internal cookbooks -
trigger-package
has a green pipeline running against latest commit
Expected (please provide an explanation if not completing)
-
Test plan indicating conditions for success has been posted and passes -
Documentation created/updated -
Tests added -
Integration tests added to GitLab QA -
Equivalent MR/issue for the GitLab Chart opened