Changes how project export tarballs are uploaded to external website
What does this MR do and why?
When requesting a project to be exported using Import/Export, the user can request the project export tarball to be uploaded to an external website, and in the case, object store is enabled, the upload uses GitLab::HttpIO#read
to stream read the file from object store. However, GitLab::HttpIO#read
doesn't perform well with large files as it makes several HTTP requests that read small chunks of the file (See more detail about the problem in #31744 (comment 975715114))
This change introduces a different method to stream files from object store that only establishes one HTTP connection on the object store and stream the file from the underlying socket connection. The download file is read in chunks of 128KB that are uploaded to the external website, then another 128KB chunk is downloaded and uploaded. This repeats until all file is read and uploaded.
For now, the new method will be under a feature flag, so we can test for a while before making it the default method.
Comparison
Bellow some comparisons of how long it took to upload an export file using different methods on my local environment (150MBps connection)
Method / File size | 15MB | 400MB |
---|---|---|
GitLab::HttpIO | ~70 seconds | Took more than 10 minutes and failed with Gitlab::HttpIO::FailedToGetChunkError |
Download to disk and upload from disk | ~10 seconds | ~80 seconds |
Remote Stream - 8MB buffer | ~10 seconds | ~101 seconds |
Remote Stream - 2MB buffer | ~15 seconds | ~85 seconds |
Remote Stream - 1MB buffer | ~11 seconds | ~74 seconds |
Remote Stream - 128KB buffer | ~11 seconds | ~75 seconds |
I chose to use a buffer size of 128KB because I didn't notice an increase in upload speed with a larger buffer. In fact, using a large buffer made the upload a bit slower.
Related to: #31744 (closed)
Kudos to Kamil for giving the idea and sharing an example of the solution
Screenshots or screen recordings
These are strongly recommended to assist reviewers and reduce the time to merge your change.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
- Enable the new method
Feature.enable(:import_export_web_upload_stream)
- Configure the local environment to use object store. GDK documentation to enable object store
- Request a project to be exported via API. In the request provide the upload website URL
curl --location --request POST 'http://gdk.test:3000/api/v4/projects/[ID]/export' \
--header 'PRIVATE-TOKEN: [TOKEN]' \
--header 'Content-Type: application/json' \
--data-raw '{
"upload": {
"URL": "[EXTERNAL_URL]",
"method": "put"
}
}'
A presigned URL for S3 can be generated using the snippet below:
#!/usr/bin/env ruby
require 'aws-sdk-s3'
bucket_name = 'bucket_example'
object_key = 'export.tar.gz'
access_key_id = 'access_key_id'
secret_access_key = 'secret_access_key'
expiration_time = 10_000
client = Aws::S3::Client.new(region: 'us-east-1', secret_access_key: secret_access_key, access_key_id: access_key_id)
bucket = Aws::S3::Bucket.new(bucket_name, client: client)
puts bucket.object(object_key).presigned_url(:put, expires_in: expiration_time)
- Wait for the project to be exported and uploaded to the external URL
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.