Skip to content

Separate local health status from consensus in HealthManager

Sami Hiltunen requested to merge smh-separate-local-health-from-consensu into master

HealthManager is the component used with repository specific primaries to health check the Gitaly nodes. Currently, it returns the health consensus as determined by the Praefect nodes. Most of Praefect's components care about whether the local Praefect node is able to contact the Gitaly node. The request router doesn't want to route to a Gitaly which is not healthy to the Praefect even if the consensus is that the Gitaly is healthy. Likewise, Praefect should not dequeue replication jobs for Gitaly's which it can't access.

Some other components should operate on the consensus though, such as the primary elector. Primary elector should only elect nodes that are deemed healthy by the majority and should not demote nodes which are only locally unhealthy. Reconciler probably should schedule jobs only for nodes that are deemed healthy by the consensus but currently uses the Praefect's local connection health.

To support both cases, HealthManager's HealthyNodes is changed to actually return the result of the local health check. This is what majority of the components want. HealthConsensus method is added on the side to pipe the consensus to the locations that need it right now, namely the primary elector.

In the future, we should not even load the consensus from the database and should instead query it directly where needed. It's implemented this way as we previously also supported the local elector, which did not have access to the database. As soon as local elector is removed, we should drop the HealthConsensus method and instead query directly in the database for the consensus.

Closes #3259 (closed) Related to #3492 (closed)

Merge request reports

Loading