blob: Buffer output of git-catfile to speed up reading LFS pointers (!3241) · Merge requests · GitLab.org / gitaly

Patrick Steinhardt requested to merge pks-blob-lfs-catfile-buffering into master Mar 11, 2021

In order to read LFS pointer candidates which were returned by git-rev-list(1), we use git-cat-file(1) with the --batch flag. By default, git-cat-file(1) flushes output after each object is output, so that a process can interactively read and write from it. In our context we do not care about interactivity though, but instead only care about the stream of objects until we've either hit the limit of objects set by the user or until we get an EOF. As such, flushing each object is not necessary and slows down processing of objects.

Fix the issue by adding the --buffer flag to git-cat-file(1). This causes it to use normal stdio buffering, which is more efficient than manually flushing after each object. This brings a small speedup when reading objects directly:

# before
BenchmarkFindLFSPointers/limitless-16         	       1	19982855076 ns/op
BenchmarkFindLFSPointers/limit-16             	1000000000	         0.167 ns/op
BenchmarkReadLFSPointers/limitless-16         	       1	18622537182 ns/op
BenchmarkReadLFSPointers/limit-16             	1000000000	         0.120 ns/op

# after
BenchmarkFindLFSPointers/limitless-16         	       1	19988832079 ns/op
BenchmarkFindLFSPointers/limit-16             	1000000000	         0.177 ns/op
BenchmarkReadLFSPointers/limitless-16         	       1	17376539083 ns/op
BenchmarkReadLFSPointers/limit-16             	1000000000	         0.103 ns/op

So we've got a 7% speedup when reading a large bunch of LFS pointer candidates, and of 20% for limited reads.

Note that there is no change for FindLFSPointers though. This is because by default, there is no buffering when a process writes into the stdin of another process directly anyway. So disabling flushing semantics doesn't really change anything in this context. As a result, we shouldn't see any improvement for either GetNewLFSPointers or GetAllLFSPointers, but there should be one for GetLFSPointers.

blob: Buffer output of git-catfile to speed up reading LFS pointers

Merge request reports