Geo: HTTP connectivity check may not be representative of actual connectivity
Summary
A customer's secondary was failing to connect to the primary to update its GeoNodeStatus. The POST request failed by TCP connection failure because the primary's host resolved to an IPv6 address that NGINX was not listening on.
It makes sense that this was failing, however, rake gitlab:geo:check
was unhelpfully reporting:
GitLab Geo HTTP(S) connectivity ...
* Can connect to the primary node ... yes
Steps to reproduce
What is the current bug behavior?
A secondary which is failing to connect to the primary to post its status, in rake gitlab:geo:check
shows:
GitLab Geo HTTP(S) connectivity ...
* Can connect to the primary node ... yes
What is the expected correct behavior?
A secondary which is failing to connect to the primary to post its status, in rake gitlab:geo:check
shows:
GitLab Geo HTTP(S) connectivity ...
* Can connect to the primary node ... no
TCP connection failure blah blah blah
Possible fixes
Most outgoing HTTP requests are handled by Gitlab::HTTP
(and I believe we are supposed to standardize on it), however the check uses Net::HTTP
: https://gitlab.com/gitlab-org/gitlab/blob/v12.4.3-ee/ee/lib/system_check/geo/http_connection_check.rb#L36
I don't know why Net::HTTP
succeeded, but we should switch the check to Gitlab::HTTP
for consistency, since it should have caught this customer's problem.