Fix Prometheus connection error in usage ping
What does this MR do?
There are about 40% topology usage ping received PrometheusClient::ConnectionError
.
With some investigation, we suspect there are two major reasons:
- API call connection scheme does not match TLS configuration. Either
server enforces TLS but we connect using HTTP
, orserver does not enable TLS but we connect with HTTPS
. - Prometheus server is TLS enforced, but the Sidekiq node is NOT configured to trust the Prometheus node certificate
A summary is available at #235739 (comment 405245426)
This MR implements solutions for the above two reasons respectively:
- automatically detect the
scheme
: doing a ready check for HTTPS connection first and then try HTTP - use option
verify: false
inwith_prometheus_client
call. This will skip the SSL certificate verification. The security risk is evaluated to be acceptable: #235739 (comment 406218403)
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Closes #235739 (closed)
Edited by Qingyu Zhao