Concurrent RepositoryExists requests on a non-existent repository lead to a failure with transactions
Staging Kibana has a bunch of errors saying begin transaction: get partition: get partition ID: get partition ID after waiting: view: partition assignment not found
. The error is returned from here. This path is triggered when multiple goroutines attempt to access a repository that doesn't have a partition assignment yet. The first goroutine takes a lock, and the rest wait for it to assign a partition.
If the repository does not exist on the disk, no partition assignment is made. The goroutine that attempted to assign the repository returns a descriptive error indicating that the repository did not exist. However, the goroutine waiting on the lock will attempt to get the partition ID again and fail as it doesn't exist. It returns a generic error describing partition assignment was not found instead of the repository not found error. This leads to problems.
This error is converted in the middleware to the usual repository not found error. If the request is a RepositoryExists
, we instead return a successful response indicating the repository does not exist to conform to the RPCs API.
As the goroutine waiting on the lock doesn't return the descriptive error, the error doesn't get converted to the successful response as expected by the RepositoryExists
API. This leads to concurrent RepositoryExists
requests on non-existing repositories to fail with a partition assignment not found
.
We should return the same expected error even if we were not the goroutine checking it. This way the caller can convert the error to the appropriate response for the RPC being called.