Add SQL-based election for shard primaries
This commit adds the following strategy to enable redundant Praefect nodes to run simultaneously:
-
Every Praefect node periodically (every second) performs a health check RPC with a Gitaly node.
-
For each node, Praefect updates a row in a new table (
node_status
) with the following information:- The name of the Praefect instance (
praefect_name
) - The name of the virtual storage name (
shard_name
) - The name of the Gitaly storage name (
storage_name
) - The timestamp of the last time Praefect tried to reach that node
(
last_contact_attempt_at
) - The timestamp of the last successful health check (
last_seen_active_at
)
- The name of the Praefect instance (
-
Periodically every Praefect node does a
SELECT
fromnode_status
to determine healthy nodes. A healthy node is defined by:- A node that has a recent successful error check (e.g. one in the last 10 s).
- A majority of the available Praefect nodes have entries that match the two above.
-
To determine the majority, we use a lightweight service discovery protocol: a Praefect node is deemed a voting member if the
praefect_name
has a recentlast_contact_attempt_at
in thenode_status
table. The name is derived from a combination of the hostname and listening port/socket. -
The primary of each shard is listed in the
shard_primaries
. If the current primary is in the healthy node list, then no election needs to be done. -
Otherwise, if there is no primary or it is unhealthy, any Praefect node can elect a new primary by choosing candidate from the healthy node list and inserting a row into the table.