ref: Reimplement ListAllTags via object pipeline (!3645) · Merge requests · GitLab.org / gitaly

Patrick Steinhardt requested to merge pks-gitpipe-find-all-tags into master Jul 05, 2021

We're currently having a lot of scalability issues with ListAllTags in production. One of the root causes of this is likely going to be the IO patterns we use: we start git-for-each-ref(1), and for each tag returned we'll request the object info from git-cat-file(1) and then, depending on the object type, we'll ask another git-cat-file(1) process for its data. In case it's an annotated tag, we manually peel the tag until we hit a non-tag object. We're thus bouncing between these three processes all the time in a sequential way, which is extremely inperformant.

Now that we have extended the gitpipe package to support enumerating references via the new ForEachRef() step, let's convert the code to use a pipeline which drives all three processes in parallel. This should prove to be much more efficient given that it fixes above IO patterns.

As an additional low hanging fruit, the new code transforms output from git-for-each-ref(1) to request both the tag as well as its peeled non-tag object in case it's an annotated one. While this does additional work in the context of lightweight tags given that we request the same object twice, this shouldn't be much of a problem given that the previously parsed object is still going to be in git-cat-file(1)'s object cache. On the other hand, this is going to be a lot more efficient in case we do have an annotated tag given that we don't have to manually peel the tag to its target anymore, but simply have git do it for us. This also lifts the current limit of nested tags.

All in all, this should result in a nice performance boost for the RPC. Given that this may result in incompatibilities, the new implementation is hidden behind a feature flag for now.

Change: performance Part of #3142 (closed)

ref: Reimplement ListAllTags via object pipeline

Merge request reports