Add support for the workhorse google client
🃏 Context
During direct uploads, there is a step where Workhorse will ask Rails (/authorize
request): Hey, I want to upload this file. Tell me where to upload it
. Rails will reply: Sure, upload it here.
From the initial version, here
was a presigned url. It's basically an URL that is presigned for the given object storage and workhorse can simply do a standard PUT
request against that URL to upload the file.
The consolidated configuration of Object storage allows us to go a step further:
- Workhorse will build the configured object storage client on its side.
- When rails reply,
here
is basically a bucket name + key. No more pre signed url.
By using native clients, the upload gets more reliable.
The above is used for Azure Blob Storage.
While checking how to add encryption for Google Cloud Storage, I noticed that we are not using the workhorse client for the Google
provider. We still use the presigned url.
This is not great as the client used for Azure Blob Storage is a "general" library that supports GCS. So why not use that?
This MR aims to support the workhorse google client.
The workhorse google client can also bring more reliable uploads. See this:
By default, resumable uploads occur automatically when the file is larger than 16 MiB. You change the cutoff for performing resumable uploads with Writer.ChunkSize. Resumable uploads are always chunked when using the Go client library.
The important word here is automatically
: the underlying go client do the heavy lifting for us.
This is issue #372593 (closed).
🔬 What does this MR do and why?
-
workhorse
- Support
google
as an object storage provider configuration.- Load and configure the proper URL opener when that configuration is read.
- Support the
google_json_key_location
,google_json_key_string
andgoogle_application_default
parameters from the GitLab config.- Credentials are checked in this order:
google_application_default
,google_json_key_string
andgoogle_json_key_location
.
- Credentials are checked in this order:
- Create the background context earlier so that it's available when loading the configuration.
- Add/Update the related tests.
- Support
-
rails
- Update Rails so that the response to the
/authorize
calls for the use of the workhorse client or not. - Add/Updated the related specs.
- Update Rails so that the response to the
This change is behind a feature flag: workhorse_google_client
.
We will probably need an update in https://gitlab.com/gitlab-org/omnibus-gitlab so that the Google object storage configuration from gitlab rails is "translated" into the workhorse configuration.
🖥 Screenshots or screen recordings
Here are the uploads I tested. I choose a list of uploads where the direct upload is used for some and not for the others.
Test | feature flag disabled | feature flag enabled |
---|---|---|
nuget package | ||
maven package | ||
generic package | ||
npm package | ||
graphql | ||
CI artifact | ||
user avatar | ||
git LFS |
The change looks stable
⚙ How to set up and validate locally
- Have GDK ready with object storage support.
- Enable consolidated configuration.
- Create a Google Cloud Storage Bucket and get the credentials file.
- Update the GitLab config to:
object_store: enabled: true proxy_download: false direct_upload: true remote_directory: <bucket name> connection: provider: Google google_project: <project name> google_client_email: <client email> google_json_key_location: <credentials key location> objects: {"artifacts":{"bucket":"artifacts"},"external_diffs":{"bucket":"external-diffs"},"lfs":{"bucket":"lfs-objects"},"uploads":{"bucket":"uploads"},"packages":{"bucket":"<bucket name>"},"dependency_proxy":{"bucket":"dependency-proxy"},"terraform_state":{"bucket":"terraform"},"pages":{"bucket":"pages"}}
- Update the Workhorse config to:
[object_storage] provider = "Google" [object_storage.google] google_json_key_location = <credentials key location>
We're now ready to play! We're going to use the Generic Package Registry for the test as we can upload files there with simple $ curl
commands.
-
Create a project.
-
Create a
dummy.txt
file with whatever content you want. -
Upload it:
curl --header "PRIVATE-TOKEN: <pat>" --upload-file ./dummy.txt "http://gdk.test:8000/api/v4/projects/<project_id>/packages/generic/my_awesome_package/1.3.7/ananas.txt"
-
Check the workhorse logs:
{"client_mode":"presigned_put","copied_bytes":8,"correlation_id":"01GBZE29AK2EN4WFK2RSA36S51","filename":"upload","is_local":false,"is_multipart":false,"is_remote":true,"level":"info","msg":"saved file","remote_id":"1662133548-89391-0001-1404-0e0c60b088070249d1ab665f10bb5864","remote_temp_object":"","time":"2022-09-02T17:45:49+02:00"}
- Check the
client_mode
.presigned_put
😿
- Check the
-
Now, let's enable the feature flag :
Feature.enable(:workhorse_google_client)
-
Re-upload the same file. (that's fine, duplicated uploads are allowed in the Generic Package Registry):
curl --header "PRIVATE-TOKEN: <pat>" --upload-file ./dummy.txt "http://gdk.test:8000/api/v4/projects/<project_id>/packages/generic/my_awesome_package/1.3.7/ananas.txt"
-
Check the workhorse logs:
{"client_mode":"go_cloud:Google","copied_bytes":8,"correlation_id":"01GBZE573K7Z18EK801XV52Y41","filename":"upload","is_local":false,"is_multipart":false,"is_remote":true,"level":"info","msg":"saved file","remote_id":"1662133644-89390-0001-0079-24772d1d8c49573933c5d7a673f113b6","remote_temp_object":"tmp/uploads/1662133644-89390-0001-0079-24772d1d8c49573933c5d7a673f113b6","time":"2022-09-02T17:47:25+02:00"}
- Check the
client_mode
.go_cloud:Google
woot! that means that the workhorse client for google has been used!🎉
- Check the
🚥 MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.