Skip to content

git: Disable generation of server info in git-repack(1) and prune existing files

Patrick Steinhardt requested to merge pks-git-prune-server-info into master

In order to be able to serve Git repositories via a dumb HTTP server, Git needs to create a bunch of metadata that tells clients what they need to fetch. This metadata is generated via git-update-server-info(1), which is can be automatically called via both git-receive-pack(1) and git-repack(1). While the former doesn't do so by default, the latter does.

For us this is a waste of resources because we don't ever serve repos via the dumb HTTP protocol. Generating this information thus wastes both precious CPU cycles and disk space for data that is ultimately never used by anything. The waste of disk space is even more pronounced because git-repack(1) doesn't always clean up the temporary files it uses to atomically update the final files. So when the command gets killed, we may accumulate more and more temporary files. In extreme cases we have seen in production, a repository whose on-disk size of actual data was less than 5GB had accumulated about 35GB of these temporary files.

Stop generating this information in git-repack(1) completely. Ideally, we'd do so by injecting configuration into all repack commands, but there is no such config option right now. Instead, we need to pass the -n flag everywhere we execute git-repack(1).

Note that this doesn't stop generating the data in all places: commands like git-gc(1) invoke git-repack(1), but we have no ability to tell it to pass -n to git-repack(1). Neither is there a config option which would allow us to globally disable the generation. The current approach is thus only a best-effort one as a stop-gap solution while we're in the process of upstreaming patches which introduce such a config option.

Furthermore, this MR introduces cleanup logic to automatically prune all server info that exists on-disk.

Closes #4027 (closed)

Edited by Patrick Steinhardt

Merge request reports

Loading