2PC via pre-receive hook
As a result of the experiments in #2466 (closed) and #2529 (closed), we have concluded that the most promising way to implement strong consistency for reference updates is by going via Git hooks: given a reference update, a hook will execute on each Gitaly node that reports back to Praefect. Praefect will collect these reports from all Gitaly nodes that take part in the current update and, if all nodes post the same update, send them a message to go ahead.
While the mid-term goal is to hook directly into the reference transaction handling code in order to handle all kinds of reference updates and not only those invoked via git-receive-pack(1), this requires a new set of hooks on Git's side. We thus decided to implement a first POC of this mechanism by using the Git pre-receive hook, which executes after all reference updates have been announced by the Git client. This should give us a better picture of how the mechanism will work in the end.
The following diagram depicts the 3PC via a pre-receive hook:
sequenceDiagram
Praefect->>+Gitaly: ReceivePack
Gitaly->>+Git: git receive-pack
Git->>+Hook: update HEAD master
Hook->>+Gitaly: TX: update HEAD master
Gitaly->>+Praefect: TX: update HEAD master
Praefect->>+Praefect: TX: collect votes
Praefect->>+Gitaly: TX: commit
Gitaly->>+Hook: TX: commit
Hook->>+Git: exit 0
Git->>+Gitaly: exit 0
Gitaly->>+Praefect: success
The 2PC voting protocol will start as soon as a first "TX" message is received on the Praefect node. Each of the pre-receive hooks will block until it receives a message from Praefect telling to to either go on with the update or to abort. In case the vote was successful, the hook will exit with 0
to indicate success, otherwise it will return an error code and thus abort the reference update.
The main goal of this issue is to establish a communication channel between hook and Praefect to allow for transaction handling. The communication channel should be implemented transparent to Gitaly nodes as much as possible so that Gitaly does not need to know if and how many transactions a given Git transaction is going to start. This ensures we can swap out the hooks in the future and start multiple transactions for a single Git command.