Skip to content

coordinator: Only schedule replication for differing error states

Starting with commit d87747c8 (Consider primary modified only if a subtransaction was committed, 2021-05-14), we consider primaries to not have been modified if at least one subtransaction was committed. The intent of this change is to avoid queueing replication jobs in case an RPC returned an error without having modified any on-disk state.

As it turns out, this optimization had unintended side effects: if an RPC fails on the first vote because of inconsistent state across all nodes, then we wouldn't ever schedule a replication job to fix this inconsistency. In some cases, this will keep up from making any progress at all because we will never converge towards the same state, for example in object pools.

This MR thus reverts above commit and implements. One alternative solution to this problem would be to make Praefect more aware of actual failure codes: if all nodes fail because of a FailedPrecondition, then we can likely assume that no changes were done. But this is quite fragile and intertwines error handling of Praefect and Gitaly.

But thinking about it, what we really care for is not whether an RPC failed or not. It's that primary and secondary nodes behaved the same. If both primary and secondaries succeeded, we're good. But if both failed with the same error, then we're good to as long as all transactions have been committed: quorum was reached on all relevant cases and nodes failed in the same way, so we can assume that nodes did indeed perform the same changes.

This commit thus relaxes the error condition to not schedule replication jobs anymore in case the primary failed, but to only schedule replication jobs to any node which has a different error than the primary. This has both the advantage that we only need to selectively schedule jobs for disagreeing nodes instead of targeting all secondaries and it avoids scheduling jobs in many cases where we do hit errors.

Merge request reports

Loading