WIP: Geo: Exit LogCursor if health checks fail for too long
What does this MR do?
Exits geo-logcursor
on serious failures (any health check failure) for too long.
The problem is described in gitlab-org/build/CNG!220 (comment 203815540):
- We stopped
geo-postgresql
which is a serious problem forgeo-logcursor
and which we expect to cause it to exit.- The new
--stdout-logging
option allowed a repeating error to show up in the logs, which led us to know whatgeo-logcursor
was doing.🎉 - All
StandardError
s are caught here https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/gitlab/geo/log_cursor/lease.rb#L41.- And nothing changes in the main loop https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/gitlab/geo/log_cursor/daemon.rb#L22-35.
- Therefore, infinite loop.
Closes https://gitlab.com/gitlab-org/gitlab-ee/issues/14627
Related issue: https://gitlab.com/charts/gitlab/issues/1211
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation created/updated or follow-up review issue created -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Performance and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Edited by Michael Kozono