Add support for the workhorse GCS client (!3060) · Merge requests · GitLab.org / charts / GitLab Chart

David Fernandez requested to merge 4009-workhorse-gcs-client-support into master Mar 30, 2023

🏀 Context

In gitlab-org/gitlab!96891 (merged), workhorse was updated so that a google cloud storage client could be setup. This helps to have more reliable uploads and unblocks bucket encryption. See #4009 (closed).

This configuration should be used in workhorse only when:

A consolidated object storage configuration is used.
A Google provider is used.
One of these parameters is set:
- google_application_default
- google_json_key_string
- google_json_key_location

Lastly, note that this part of workhorse is gated behind a feature flag in rails. Basically, rails will instruct workhorse to use either:

a presigned url (this is what is used today and what is used when the feature flag is disabled)
the workhorse google cloud storage client (used when the feature flag is enabled).

Since, the feature flag is currently disabled by default, this MR will have no impact on uploads.

🔬 What does this MR do?

Update the workhorse.object_storage.config template so that if the proper conditions are detected, it will generate the correct workhorse configuration file for google cloud storage.
Update a related spec.

⛓ Related issues

#4009 (closed)

This is the mirror change of this omnibus change: gitlab-org/omnibus-gitlab!6530 (merged)

🤔 How to validate this locally?

As we can see here, we have 3 different settings.

Now, we don't need all 3. It's actually the opposite: only one of them is needed. We thus have 3 configurations to test here.

We're going to need:

a k8s cluster ready.
a GCS bucket.
a google service account that can write to that account.
a google key associated with that service and in the json format.

To have a look in logs, we use kail.

As we will see, only one of the parameters can be used without updating deployment files (use case 2️⃣) but for completeness here, we go through all 3 possible parameters.

⚗ The testing scenario

We are going to keep it nice and simple and use the generic package registry. Basically, we're going to upload a dummy file to the GitLab generic package registry and assert that workhorse used its google cloud storage client to upload that file to object storage.

Have a project + personal access token ready.

Execute (from outside the omnibus instance)

$ curl --upload-file <dummy file> "http://<user>:<pat>@<base_url>/api/v4/projects/<project_id>/packages/generic/my/1.1.2/file.txt"

Check the workhorse logs ($ tail -f /var/log/gitlab/gitlab-workhorse/current), it should contain a line similar to this one:

default/gitlab-webservice-default-6d9bbc8864-k25zv[gitlab-workhorse]: {"client_mode":"presigned_put","copied_bytes":8,"correlation_id":"01GWVPHHNP1HN3MV5RVV13FS1S","filename":"upload","is_local":false,"is_multipart":false,"is_remote":true,"level":"info","msg":"saved file","remote_id":"1680261826-132-0003-4213-603ba2ba17befa66fde116f5253fcb9e","remote_temp_object":"","time":"2023-03-31T11:23:47Z"}

This is the proof that the upload was successful. Please note (the client_mode) that we are not using the workhorse gcs client that this MR will allow. That's because, this decision is done by rails and currently, it's behind a feature flag that is disabled by default.

Another way to confirm that the scenario went ok, is trying to download the file:

$ curl "http://<user>:<pat>@<base_url>/api/v4/projects/<project_id>/packages/generic/my/1.1.2/file.txt"

You should get the file contents back.

🐰 Going further

So you want to use the workhorse gcs client? Fine, let's enable the feature flag :

$ kubectl exec -it <gitlab-webservice pod name> -c webservice -- /bin/bash

(in the container) $ cd /srv/gitlab/

$ ./bin/rails c

irb(main):001:0> Feature.enable(:workhorse_google_client)
irb(main):002:0> exit

$ exit

Try to upload the file with curl again.

This time around, workhorse logs will show this:

default/gitlab-webservice-default-6d9bbc8864-k25zv[gitlab-workhorse]: {"client_mode":"go_cloud:Google","copied_bytes":8,"correlation_id":"01GWVQCNH2MZ5YKR43WD29TBHW","filename":"upload","is_local":false,"is_multipart":false,"is_remote":true,"level":"info","msg":"saved file","remote_id":"1680262715-199-0001-4337-4e293e8145637540b9eb6b965d95ef30","remote_temp_object":"tmp/uploads/1680262715-199-0001-4337-4e293e8145637540b9eb6b965d95ef30","time":"2023-03-31T11:38:36Z"}

Notice the client_mode. It's go_cloud:Google. That means that workhorse used its own GCS client to upload the file 🎉

If you still have doubts, you can always check the bucket on GCS. Your file will be there 😸

Setting 1️⃣ `google_application_default`

This configuration is challenging in the sense that the google libraries will check default locations in this mode.

Fortunately, one of these locations is an environment variable. As such, we can configure it and point to the json file.

To keep this simple, we're going to have a k8s secret that is the contents of the google json key file and write that secret to a specific file, then point that file with the GOOGLE_APPLICATION_CREDENTIALS environment variable.

Update charts/gitlab/charts/webservice/templates/deployment.yaml with this:

Diff

diff --git a/charts/gitlab/charts/webservice/templates/deployment.yaml b/charts/gitlab/charts/webservice/templates/deployment.yaml
index 95111a72a..58017fd20 100644
--- a/charts/gitlab/charts/webservice/templates/deployment.yaml
+++ b/charts/gitlab/charts/webservice/templates/deployment.yaml
@@ -203,6 +203,8 @@ spec:
               value: '/var/opt/gitlab/templates'
             - name: CONFIG_DIRECTORY
               value: '/srv/gitlab/config'
+            - name: GOOGLE_APPLICATION_CREDENTIALS
+              value: '/etc/secret-volume/key'
             {{- if $.Values.metrics.enabled }}
             - name: prometheus_multiproc_dir
               value: /metrics
@@ -262,6 +264,9 @@ spec:
             - name: webservice-secrets
               mountPath: '/etc/gitlab'
               readOnly: true
+            - name: secret-volume
+              mountPath: /etc/secret-volume
+              readOnly: true
             - name: webservice-secrets
               mountPath: /srv/gitlab/config/secrets.yml
               subPath: rails-secrets/secrets.yml
@@ -359,6 +364,8 @@ spec:
               value: '/var/opt/gitlab/templates'
             - name: CONFIG_DIRECTORY
               value: '/srv/gitlab/config'
+            - name: GOOGLE_APPLICATION_CREDENTIALS
+              value: '/etc/secret-volume/key'
             {{- if .workhorse.sentryDSN }}
             - name: GITLAB_WORKHORSE_SENTRY_DSN
               value: {{ .workhorse.sentryDSN }}
@@ -372,6 +379,9 @@ spec:
             - name: workhorse-secrets
               mountPath: '/etc/gitlab'
               readOnly: true
+            - name: secret-volume
+              mountPath: /etc/secret-volume
+              readOnly: true
             - name: shared-upload-directory
               mountPath: /srv/gitlab/public/uploads/tmp
               readOnly: false
@@ -429,6 +439,9 @@ spec:
       - name: workhorse-config
         configMap:
             name: {{ $.Release.Name }}-workhorse-{{ .name }}
+      - name: secret-volume
+        secret:
+          secretName: google-key-json
       - name: init-webservice-secrets
         projected:
           defaultMode: 0400

Let's create a rails.gcs.yml:

provider: Google
google_project: <google project id>
google_application_default: true

Let's create the object storage secret:

$ kubectl create secret generic gitlab-object-storage --from-file=connection=rails.gcs.yaml

Let's create a secret with the google key json file:

$ kubectl create secret generic google-key-json --from-file=key=<full path to google key json file>

Lastly, let's reate additional values.yml file to read that object storage secret (and also disable minio):

global:
  minio:
    enabled: false
  registry:
    bucket: <bucket name>
  appConfig:
    object_store:
      enabled: true
      connection:
        secret: gitlab-object-storage
        key: connection
    lfs:
      bucket: <bucket name>
    artifacts:
      bucket: <bucket name>
    uploads:
      bucket: <bucket name>
    packages:
      bucket: <bucket name>
    backups:
      bucket: <bucket name>

Let's deploy the gitlab chart with the additional file (we use the "minikube minimum" base):

$ helm upgrade --install gitlab . --timeout 600s -f ./examples/values-minikube-minimum.yaml -f values.yml

Checking the workhorse logs ($ kail -c gitlab-workhorse):

default/gitlab-webservice-default-675c6cddc5-9d46l[gitlab-workhorse]: {"address":"0.0.0.0:8181","level":"info","msg":"Running upstream server","network":"tcp","time":"2023-03-30T13:03:54Z"}
default/gitlab-webservice-default-675c6cddc5-9d46l[gitlab-workhorse]: {"address":"/tmp/gitlab/workhorse.sock","level":"info","msg":"Running upstream server","network":"unix","time":"2023-03-30T13:03:54Z"}

Workhorse booted normally ✅

Let's check its config:

$ kubectl exec -it <gitlab-webservice pod name> -c gitlab-workhorse -- /bin/bash 

(inside the gitlab-workhorse container) $ cat /srv/gitlab/config/workhorse-config.toml

We get this config content:

shutdown_timeout = "61s"
[redis]
URL = "redis://gitlab-redis-master.default.svc:6379"
Password = "xxx"
[object_storage]
provider = "Google"
# Google storage configuration.
[object_storage.google]
google_application_default = true
[image_resizer]
max_scaler_procs = 2
max_filesize = 250000
[[listeners]]
network = "tcp"
addr = "0.0.0.0:8181"

object.storage and object.storage.google sections are properly configured ✅

The testing scenario is working with this config ✅

Setting 2️⃣ `google_json_key_string`

Alright, this is the easiest configuration to test because it's the one in the charts example file.

Basically, we pass the contents of the google key file.

With a k8s cluster, ready (and empty),

Create a rails.gcs.yaml file with:

provider: Google
google_project: <google project id>
google_json_key_string: |
  <exact contents of the json key file>

Create a k8s secret out of that file:

$ kubectl create secret generic gitlab-object-storage --from-file=connection=rails.gcs.yaml

Create additional values.yml file to read that secret (and also disable minio):

global:
  minio:
    enabled: false
  registry:
    bucket: <bucket name>
  appConfig:
    object_store:
      enabled: true
      connection:
        secret: gitlab-object-storage
        key: connection
    lfs:
      bucket: <bucket name>
    artifacts:
      bucket: <bucket name>
    uploads:
      bucket: <bucket name>
    packages:
      bucket: <bucket name>
    backups:
      bucket: <bucket name>

Let's deploy the gitlab chart with the additional file (we use the "minikube minimum" base):

$ helm upgrade --install gitlab . --timeout 600s -f ./examples/values-minikube-minimum.yaml -f values.yml

Checking the workhorse logs ($ kail -c gitlab-workhorse):

default/gitlab-webservice-default-745f57c88d-9ck7c[gitlab-workhorse]: {"address":"0.0.0.0:8181","level":"info","msg":"Running upstream server","network":"tcp","time":"2023-03-30T11:59:48Z"}
default/gitlab-webservice-default-745f57c88d-9ck7c[gitlab-workhorse]: {"address":"/tmp/gitlab/workhorse.sock","level":"info","msg":"Running upstream server","network":"unix","time":"2023-03-30T11:59:48Z"}

Workhorse was able to boot normally 👍

Let's check its config:

$ kubectl exec -it <gitlab-webservice pod name> -c gitlab-workhorse -- /bin/bash 

(inside the gitlab-workhorse container) $ cat /srv/gitlab/config/workhorse-config.toml

We get this config content:

shutdown_timeout = "61s"
[redis]
URL = "redis://gitlab-redis-master.default.svc:6379"
Password = "xxx"
[object_storage]
provider = "Google"
# Google storage configuration.
[object_storage.google]
google_json_key_string = '''
<exact google key json file contents>
'''
[image_resizer]
max_scaler_procs = 2
max_filesize = 250000
[[listeners]]
network = "tcp"
addr = "0.0.0.0:8181"

That's the expected config for object_storage and object_storage.google.

The testing scenario is working with this config ✅

Setting 3️⃣ `google_json_key_location`

This time around this value needs to point to the location of the google key json file.

For this, we're going to use the same approach to 1️⃣ but instead of having an environment variable, we directly point to the expected file location.

Update charts/gitlab/charts/webservice/templates/deployment.yaml with this:

Diff

diff --git a/charts/gitlab/charts/webservice/templates/deployment.yaml b/charts/gitlab/charts/webservice/templates/deployment.yaml
index 95111a72a..58017fd20 100644
--- a/charts/gitlab/charts/webservice/templates/deployment.yaml
+++ b/charts/gitlab/charts/webservice/templates/deployment.yaml
@@ -262,6 +264,9 @@ spec:
             - name: webservice-secrets
               mountPath: '/etc/gitlab'
               readOnly: true
+            - name: secret-volume
+              mountPath: /etc/secret-volume
+              readOnly: true
             - name: webservice-secrets
               mountPath: /srv/gitlab/config/secrets.yml
               subPath: rails-secrets/secrets.yml
@@ -372,6 +379,9 @@ spec:
             - name: workhorse-secrets
               mountPath: '/etc/gitlab'
               readOnly: true
+            - name: secret-volume
+              mountPath: /etc/secret-volume
+              readOnly: true
             - name: shared-upload-directory
               mountPath: /srv/gitlab/public/uploads/tmp
               readOnly: false
@@ -429,6 +439,9 @@ spec:
       - name: workhorse-config
         configMap:
             name: {{ $.Release.Name }}-workhorse-{{ .name }}
+      - name: secret-volume
+        secret:
+          secretName: google-key-json
       - name: init-webservice-secrets
         projected:
           defaultMode: 0400

Let's create a rails.gcs.yml:

provider: Google
google_project: <google project id>
google_json_key_location: /etc/secret-volume/key

Let's create the object storage secret:

$ kubectl create secret generic gitlab-object-storage --from-file=connection=rails.gcs.yaml

Let's create a secret with the google key json file:

$ kubectl create secret generic google-key-json --from-file=key=<full path to google key json file>

Lastly, let's create additional values.yml file to read that object storage secret (and also disable minio):

global:
  minio:
    enabled: false
  registry:
    bucket: <bucket name>
  appConfig:
    object_store:
      enabled: true
      connection:
        secret: gitlab-object-storage
        key: connection
    lfs:
      bucket: <bucket name>
    artifacts:
      bucket: <bucket name>
    uploads:
      bucket: <bucket name>
    packages:
      bucket: <bucket name>
    backups:
      bucket: <bucket name>

Let's deploy the gitlab chart with the additional file (we use the "minikube minimum" base):

$ helm upgrade --install gitlab . --timeout 600s -f ./examples/values-minikube-minimum.yaml -f values.yml

Checking the workhorse logs ($ kail -c gitlab-workhorse):

default/gitlab-webservice-default-7b65945595-h4r8p[gitlab-workhorse]: {"address":"0.0.0.0:8181","level":"info","msg":"Running upstream server","network":"tcp","time":"2023-03-30T13:25:52Z"}
default/gitlab-webservice-default-7b65945595-h4r8p[gitlab-workhorse]: {"address":"/tmp/gitlab/workhorse.sock","level":"info","msg":"Running upstream server","network":"unix","time":"2023-03-30T13:25:52Z"}

Workhorse was able to boot normally ✅

Let's check its config:

$ kubectl exec -it <gitlab-webservice pod name> -c gitlab-workhorse -- /bin/bash 

(inside the gitlab-workhorse container) $ cat /srv/gitlab/config/workhorse-config.toml

We get this config content:

shutdown_timeout = "61s"
[redis]
URL = "redis://gitlab-redis-master.default.svc:6379"
Password = "xxx"
[object_storage]
provider = "Google"
# Google storage configuration.
[object_storage.google]
google_json_key_location = "/etc/secret-volume/key"
[image_resizer]
max_scaler_procs = 2
max_filesize = 250000
[[listeners]]
network = "tcp"
addr = "0.0.0.0:8181"

That's the expected config for object_storage and object_storage.google.

The testing scenario is working with this config ✅

Checklist

See Definition of done.

For anything in this list which will not be completed, please provide a reason in the MR discussion.

Required

Merge Request Title and Description are up to date, accurate, and descriptive
MR targeting the appropriate branch
MR has a green pipeline on GitLab.com
When ready for review, MR is labeled "~workflow::ready for review" per the Distribution MR workflow

Expected (please provide an explanation if not completing)

Test plan indicating conditions for success has been posted and passes
Documentation created/updated
Tests added
Integration tests added to GitLab QA
Equivalent MR/issue for omnibus-gitlab opened
Validate potential values for new configuration settings. Formats such as integer 10, duration 10s, URI scheme://user:passwd@host:port may require quotation or other special handling when rendered in a template and written to a configuration file.

Edited Apr 18, 2023 by Jason Plum

Add support for the workhorse GCS client

🏀 Context

🔬 What does this MR do?

⛓ Related issues

🤔 How to validate this locally?

⚗ The testing scenario

🐰 Going further

Setting 1️⃣ google_application_default

Setting 2️⃣ google_json_key_string

Setting 3️⃣ google_json_key_location

Checklist

Required

Expected (please provide an explanation if not completing)

Merge request reports

Setting 1️⃣ `google_application_default`

Setting 2️⃣ `google_json_key_string`

Setting 3️⃣ `google_json_key_location`