Add support for the workhorse google storage client configuration
⚾ Context
In gitlab!96891 (merged), workhorse was updated so that a google cloud storage client could be setup. This helps to have more reliable uploads and unblocks bucket encryption. See #7324 (closed).
This configuration should be used in workhorse only when:
- A consolidated object storage configuration is used.
- A
Google
provider is used. - One of these parameters is set:
google_application_default
google_json_key_string
google_json_key_location
Lastly, note that this part of workhorse is gated behind a feature flag in rails. Basically, rails will instruct workhorse to use either:
- a presigned url (this is what is used today and what is used when the feature flag is disabled)
- the workhorse google cloud storage client (used when the feature flag is enabled).
Since, the feature flag is currently disabled by default, this MR will have no impact.
🔬 What does this MR do?
- Map the object storage configuration from the rails config to the workhorse config file when the conditions are met.
- Update the related specs.
⚙ How to validate this locally
🔧 Setup
- Setup an omnibus development environment as described in https://gitlab.com/gitlab-org/omnibus-gitlab/-/blob/master/doc/development/setup.md.
- Make sure to pull the changes of this MR branch as described in https://gitlab.com/gitlab-org/omnibus-gitlab/-/blob/master/doc/development/setup.md#get-the-source-of-omnibus-gitlab.
- Make sure that you have an Google Cloud Storage bucket ready with a service account and its related
json
file. - Enable the related feature flag in a
# gitlab-rails console
:Feature.enable(:workhorse_google_client)
- This step is important. If not enabled, the rails backend will not instruct workhorse to use its google client (and use a pre signed url instead).
Now that we have an omnibus "instance" running, let's configure object storage.
- In
/etc/gitlab/gitlab.rb
:gitlab_rails['object_store']['enabled'] = true gitlab_rails['object_store']['proxy_download'] = true gitlab_rails['object_store']['connection'] = { <this is what we will update through our scenarios> } gitlab_rails['object_store']['objects']['artifacts']['bucket'] = '<bucket>' gitlab_rails['object_store']['objects']['artifacts']['proxy_download'] = false gitlab_rails['object_store']['objects']['external_diffs']['bucket'] = '<bucket>' gitlab_rails['object_store']['objects']['lfs']['bucket'] = '<bucket>' gitlab_rails['object_store']['objects']['uploads']['bucket'] = '<bucket>' gitlab_rails['object_store']['objects']['packages']['bucket'] = '<bucket>' gitlab_rails['object_store']['objects']['dependency_proxy']['enabled'] = false gitlab_rails['object_store']['objects']['dependency_proxy']['bucket'] = '<bucket>' gitlab_rails['object_store']['objects']['terraform_state']['bucket'] = '<bucket>' gitlab_rails['object_store']['objects']['pages']['bucket'] = '<bucket>'
⚗ The testing scenario
We are going to keep it nice and simple and use the generic package registry. Basically, we're going to upload a dummy file to the GitLab generic package registry and assert that workhorse used its google cloud storage client to upload that file to object storage.
- Have a project + personal access token ready.
- Execute (from outside the omnibus instance)
$ curl --upload-file <dummy file> "http://<user>:<pat>@<base_url>/api/v4/projects/<project_id>/packages/generic/my/1.1.2/file.txt"
- Check the workhorse logs (
$ tail -f /var/log/gitlab/gitlab-workhorse/current
), it should contain a line similar to this one:{"client_mode":"go_cloud:Google","copied_bytes":8,"correlation_id":"01GJG2WCGK5TFARQSY6QM7DJSV","filename":"upload","is_local":false,"is_multipart":false,"is_remote":true,"level":"info","msg":"saved file","remote_id":"1669134693-23742-0001-4032-0dde1427ae53d9167356b065ff491342","remote_temp_object":"tmp/uploads/1669134693-23742-0001-4032-0dde1427ae53d9167356b065ff491342","time":"2022-11-22T16:31:34Z"}
- The important part is
client_mode
. It MUST be set togo_cloud:Google
. This is workhorse saying that it is using its own google cloud storage client to upload the file which is what we want✅
- The important part is
1️⃣ With google_application_default
This configuration is challenging in the sense that the google libraries will check default locations in this mode.
Fortunately, one of these locations is an environment variable. As such, we can configure it and point to the json file.
- Put the json file somewhere reachable:
# nano /etc/gitlab/object_storage.json
- Update the
/etc/gitlab/gitlab.rb
file with this line:
gitlab_rails['object_store']['connection'] = {
'provider' => 'Google',
'google_project' => 'dfernandez-5494dd2c',
'google_application_default' => true
}
- Now, update the
/etc/gitlab/gitlab.rb
file to set environment variables. We have to do this for the rails and workhorse service:
gitlab_rails['env'] = {
'GOOGLE_APPLICATION_CREDENTIALS' => '/etc/gitlab/object_storage.json'
}
gitlab_workhorse['env'] = {
'GOOGLE_APPLICATION_CREDENTIALS' => '/etc/gitlab/object_storage.json'
}
- Reconfigure with:
# gitlab-ctl reconfigure
. (# gitlab-ctl restart
might be needed.) - Check the workhorse configuration with
# less /var/opt/gitlab/gitlab-workhorse/config.toml
. Thegoogle_application_default
should be set totrue
.
Try the testing scenario, it should work.
2️⃣ With google_json_key_string
In this configuration, the parameter holds the entire json file contents.
- Update the
/etc/gitlab/gitlab.rb
file with this line:
gitlab_rails['object_store']['connection'] = {
'provider' => 'Google',
'google_project' => 'dfernandez-5494dd2c',
'google_json_key_string' => '
<the exact contents of the json service account file>
'
}
- Reconfigure with:
# gitlab-ctl reconfigure
. (# gitlab-ctl restart
might be needed.) - Check the workhorse configuration with
# less /var/opt/gitlab/gitlab-workhorse/config.toml
. The content of the json file should be there.
Try the testing scenario, it should work.
3️⃣ With google_json_key_location
In this configuration, the parameter points to the json file location path.
- Put the json file somewhere reachable:
# nano /etc/gitlab/object_storage.json
- Update the
/etc/gitlab/gitlab.rb
file with this line:
gitlab_rails['object_store']['connection'] = {
'provider' => 'Google',
'google_project' => 'dfernandez-5494dd2c',
'google_json_key_location' => '/etc/gitlab/object_storage.json'
}
- Reconfigure with:
# gitlab-ctl reconfigure
. (# gitlab-ctl restart
might be needed.) - Check the workhorse configuration with
# less /var/opt/gitlab/gitlab-workhorse/config.toml
. The content of the json path should be there.
Try the testing scenario, it should work.
Related issues
Checklist
See Definition of done.
For anything in this list which will not be completed, please provide a reason in the MR discussion
Required
-
Merge Request Title, and Description are up to date, accurate, and descriptive -
MR targeting the appropriate branch -
MR has a green pipeline on GitLab.com -
Pipeline is green on dev.gitlab.org if the change is touching anything besides documentation or internal cookbooks -
trigger-package
has a green pipeline running against latest commit
Expected (please provide an explanation if not completing)
-
Test plan indicating conditions for success has been posted and passes - [-] Documentation created/updated
- No documentation impact.
-
Tests added - [-] Integration tests added to GitLab QA
- The QA suite already contains specs using different object storage options.
-
Equivalent MR/issue for the GitLab Chart opened