Skip to content

checks: Fix revalidation of preexisting commits

What does this MR do and why?

When pushing commits into a repository, then client and server negotiate a packfile containing all objects which are necessary to end up with a fully connected graph on the server-side. In general, this contains at least all objects which have been newly introduced between the set of all old and new references. But in some cases, it can be that Git will send packfiles which knowingly contain objects which existed on the server side already, e.g. when Git decides to reuse deltas from an existing packfile where the delta base is a preexisting commit. As a result, any well-formed packfile is a superset of objects required to satisfy the update.

In v14.3 we have refactored access checks to use the quarantine directory to enumerate new commits directly. Because of above property, we may get too many objects from the object quarantine directory, which means that as a result we may perform access checks on commits which in fact aren't new in case the client decided to include these in the pack. While this is not a problem in most access checks (an object which is in the main repository but which we re-check is going to still pass the checks), other checks are more sensitive. Most importantly, push rules may require a commit to be created by the author who is currently performing the change. If we include preexisting commits of a different author in such a check, then it is totally expected that the access check will now fail. As a result, we must never include preexisting commits in push rule access checks.

To determine new commits in push rules, we do an in-memory walk of commits returned from the quarantine directory, where we walk from the tip of each change until we are not able to satisfy the commit's parents anymore. And in this case, we happily traverse past commits which are known already inc ase those were returned from the quarantine directory. To fix this, we need to abort the walk as soon as we hit an already known object.

The problem is though that we have no easy way to determine the already known object in the general case. But we can do so in limited cases: when the change we're processing has both an old and a new revision (that is, it is an "update"), then we simply skip adding oldrev to the result set. This doesn't work though for branch creations, where we ain't got no oldrev. We thus fall back to enumerating commits not via the quarantine directory in that case, but instead by using a revision walk with --not --all. This walk will not contain any objects which are referenced by any reference, and thus we can be sure that the in-memory walk will not traverse past any preexisting object.

Implement this schema. Unfortunately this is going to be a lot less performant compared to using the quarantine directory in all cases. But better be less performant than wrong.

Part of gitaly#3942 (closed)

How to set up and validate locally

Please refer to gitaly#3942 (comment 750642474). Reproducing is not quite deterministic given that it depends on the packfile negotiation.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Merge request reports

Loading