Skip to content

push_rules: Implement bulk-checking of file sizes

What does this MR do?

The file size check checks each newly pushed blob's size to see whether it's bigger than a configured threshold and, if so, rejects the ref update. This is an expensive check though: we need to go both through all preexisting as well as all new refs in order to find out new blobs via a graph walk. As such, this check doesn't only scale with the number of changes, but also with the repository size itself.

This MR changes #new_blobs to handle multiple new revisions at once in a single RPC call to Gitaly so we can convert this check to use a single bulk-load of new blobs. While this doesn't help much with walking the positive side of the graph walk, it does amortize the negative walk of all preexisting refs and will thus in most cases result in a significant speedup if multiple changes are to be checked.

Ideally, we'd go even further and enumerate new blobs directly via the quarantine directory: we wouldn't have to do a graph walk at all in this case, but can just directly look up all new blobs. While this would be as fast as we can get, the downside is that we wouldn't have blob paths available anymore given that these blobs wouldn't have been walked via a tree object. We would still be able to at least present the blob ID to the user, but the user experience is definitely worse in this case.

We may still at a later point decide to go this step given that it si a huge performance win (e.g. on gitlab-org/gitlab, we're talking about 10ms vs 30s). But for now, this commit only does the uncontroversial part of batch-computing new blobs.

Closes #339637 (closed)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Merge request reports

Loading