Add WalkParallel method to storage driver interface
Overview
Due to paginated API calls depending on Walk
to return results in stable order, we have decided to handle parallelized walks via a new method on the storagedriver interface. See: !37 (comment 268823080)
This allows callers to determine the behavior of Walk
. Namely, they are able to use the faster WalkParallel
given the conditions that:
- Their
WalkFn
callback is thread-safe - They intend to traverse all (or a large subset) of the files under the given path
- Processing order is not important
This Merge request includes changes that:
- add the
WalkParallel
method to the stoagedriver interface and add implement that method on all drivers except for s3. For s3, the originalWalk
method is added back. - Adds thread-safety to the catalog_test Walk function call
- Calls the new
WalkParallel
method for non-paginated walks - Fixes an flaw in
doWalkParallel
where directory skipping error flags were incorrectly reported up to callers - Fixes an incorrect regex in driver testsuites
Potential weaknesses
This MR updates all storage drivers to use parallel walks for WalkParallel
it might be prudent to have production drivers, such as GCS, swift, etc. to fall back to sequential Walks until we can verify their behavior when running highly parallelized workloads.
In order to do this, we would need to find a work around for TestWalkParallelStopsProcessingOnError
in the driver testsuites which will timeout for non-parallel Walks.