Teach Praefect dataloss command to list which Gitaly nodes are missing data
Problem to solve
The praefect dataloss
subcommand provides insight into which repository replications have failed for a time window. If a there are two replicas, and the job fails for one of the replicas, but succeeds on the other, data loss is still reported.
Further details
Current subcommand output:
Failed replication jobs between [2020-01-02 00:00:00 +0000 UTC, 2020-01-03 00:00:00 +0000 UTC):
test-repo/relative-path/1: 1 jobs
test-repo/relative-path/2: 4 jobs
test-repo/relative-path/3: 2 jobs
Proposal
For each repository that is suspected to have data loss on one or more nodes, list which Gitaly nodes are suspected of having missing data.
Assuming gitaly-1
was the primary and just went down:
Failed replication jobs between [2020-01-02 00:00:00 +0000 UTC, 2020-01-03 00:00:00 +0000 UTC):
test-repo/relative-path/1: gitaly-3, gitaly-3
test-repo/relative-path/2: gitaly-2
test-repo/relative-path/3: gitaly-3
Edited by James Ramsay (ex-GitLab)