Skip to content

Link existing LFS objects from parent fork during uploads

Stan Hu requested to merge sh-link-existing-lfs-uploads into master

What does this MR do and why?

Previously LFS uploads would always have to be reuploaded to a fork even if the parent already had received the LFS file, but this is unnecessary, wasting time and bandwidth. Consider this sequence of events:

  1. Push LFS file test.bin to project A.
  2. Fork project A to project B.
  3. Push LFS file test2.bin to project A.
  4. Push to project B.

When 4 happens, GitLab should be smart enough to realize that if the user has access to the parent project, then we should be able to link the LFS files in A without requesting a reupload of the file.

Relates to #297022 (closed)

How to set up and validate locally

  1. Enable feature flag: Feature.enable(:lfs_auto_link_fork_source)
  2. Push a large LFS file test.bin to project A.
  3. Fork project A to project B.
  4. Push an even larger LFS file test2.bin to project A.
  5. Push to project B with GIT_CURL_VERBOSE=1 GIT_TRACE=1 GIT_TRACE_PACKET=2 git push <projectB>

The push should finish quickly and not request an upload. The curl output should show something like:

> POST /root/lfs-upload-1-fork.git/info/lfs/objects/batch HTTP/1.1
> Host: stanhu.gogitlab.com
> Accept: application/vnd.git-lfs+json; charset=utf-8
> Authorization: Basic * * * * *
> Content-Length: 203
> Content-Type: application/vnd.git-lfs+json; charset=utf-8
> User-Agent: git-lfs/2.13.3 (GitHub; darwin amd64; go 1.16.2)
>
{"operation":"upload","objects":[{"oid":"73cd9bfbe4371b4edadc1d154d59363700b54695caa094fab08d63f069f64b87","size":426285634}],"transfers":["basic","lfs-standalone-file"],"ref":{"name":"refs/heads/main"}}

And respond with something like:

< HTTP/2.0 200 OK
< Content-Length: 105
< Cache-Control: max-age=0, private, must-revalidate
< Content-Type: application/vnd.git-lfs+json; charset=utf-8
< Date: Mon, 06 Dec 2021 08:23:44 GMT
< Etag: W/"24b2e313e4d9a99dc5fc55aece061b46"
< Page-Title: GitLab
< Permissions-Policy: interest-cohort=()
< Referrer-Policy: strict-origin-when-cross-origin
< Server: nginx
< Strict-Transport-Security: max-age=63072000
< Vary: Accept
< X-Content-Type-Options: nosniff
< X-Download-Options: noopen
< X-Frame-Options: DENY
< X-Permitted-Cross-Domain-Policies: none
< X-Request-Id: 01FP7DEX2XJKGNHPMCM4BFZ56F
< X-Runtime: 0.107383
< X-Ua-Compatible: IE=edge
< X-Xss-Protection: 1; mode=block
<
00:23:44.602169 trace git-lfs: HTTP: {"objects":[{"oid":"73cd9bfbe4371b4edadc1d154d59363700b54695caa094fab08d63f069f64b87","size":426285634}]}

You should not see any Uploading LFS files messages or headers that contain header:

01:25:37.911564 trace git-lfs: HTTP: {"objects":[{"oid":"a28847c9cff2980c3695ce9d6eee99fa66ce018100f1fe4f5af9a78630899734","size":33847418,"actions":{"upload":{"href":"https://example.com/root/lfs-upload1.git/gitlab-lfs/objects/a28847c9cff2980c3695ce9d6eee99fa66ce018100f1fe4f5af9a78630899734/33847418","header":{"Authorization":"Basic cm9vdDpleUpoYkdjaU9pSklVekk...

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Stan Hu

Merge request reports

Loading