blob: Buffer output of git-catfile to speed up reading LFS pointers
In order to read LFS pointer candidates which were returned by
git-rev-list(1), we use git-cat-file(1) with the --batch
flag. By
default, git-cat-file(1) flushes output after each object is output, so
that a process can interactively read and write from it. In our context
we do not care about interactivity though, but instead only care about
the stream of objects until we've either hit the limit of objects set by
the user or until we get an EOF. As such, flushing each object is not
necessary and slows down processing of objects.
Fix the issue by adding the --buffer
flag to git-cat-file(1). This
causes it to use normal stdio buffering, which is more efficient than
manually flushing after each object. This brings a small speedup when
reading objects directly:
# before
BenchmarkFindLFSPointers/limitless-16 1 19982855076 ns/op
BenchmarkFindLFSPointers/limit-16 1000000000 0.167 ns/op
BenchmarkReadLFSPointers/limitless-16 1 18622537182 ns/op
BenchmarkReadLFSPointers/limit-16 1000000000 0.120 ns/op
# after
BenchmarkFindLFSPointers/limitless-16 1 19988832079 ns/op
BenchmarkFindLFSPointers/limit-16 1000000000 0.177 ns/op
BenchmarkReadLFSPointers/limitless-16 1 17376539083 ns/op
BenchmarkReadLFSPointers/limit-16 1000000000 0.103 ns/op
So we've got a 7% speedup when reading a large bunch of LFS pointer candidates, and of 20% for limited reads.
Note that there is no change for FindLFSPointers
though. This is
because by default, there is no buffering when a process writes into the
stdin of another process directly anyway. So disabling flushing
semantics doesn't really change anything in this context. As a result,
we shouldn't see any improvement for either GetNewLFSPointers
or
GetAllLFSPointers
, but there should be one for GetLFSPointers
.