Switch GCS WalkParallel to use WalkFallbackParallel, rather than sequential Walk
Rationale
WalkFallbackParallel has been implemented, but it is currently not used by any production driver. This is due to the fact that most production drivers are not built for heavy parallelized workflows. The GCS driver; however, does make use of a limiter which restricts the number of concurrent goroutines that may be launched. This should reduce the risk and work of switching the driver over.
Concerns
It's possible the limiter is not efficacious when used against a workload as intensive as fully walking a repository.
Other solutions
WalkFallbackParallel
calls the driver Stat
method once per file. We could reduce the number of requests to GCS by implementing a custom WalkParalell
(and potentially Walk
) for GCS. This is fundamentally a more complicated change and this issue is meant to reduce the time which it takes to run garbage collection when using the GCS storage driver, and not necessarily reduce the number of HTTP requests made. It's possible that we achieve a satisfactory result with WalkFallbackParallel
and can focus development efforts elsewhere.