Make resource checking disabled by default
What does this MR do?
Make the resource checking added in MR !3399 (merged) disabled by default
Why was this MR needed?
The resource checking required additional permissions to the service account which breaks the setup lacking those permissions.
In this MR a specific integration test has been added to run successfully a job using the minimal permission needed from a custom service account
What's the best way to test this MR?
- Use the following config.toml. In this configuration, the service_account is set to
cs-sa
the default service account in the cluster
config.toml
check_interval = 1
log_level = "debug"
[session_server]
session_timeout = 1800
[[runners]]
request_concurrency = 1
url = "https://gitlab.com/"
token = "__REDACTED__"
executor = "kubernetes"
[runners.custom_build_dir]
[runners.kubernetes]
service_account="cs-sa"
pull_policy="always"
image = "alpine:latest"
namespace_overwrite_allowed = ""
privileged = true
allow_privilege_escalation = true
service_account_overwrite_allowed = ""
pod_annotations_overwrite_allowed = ""
terminationGracePeriodSeconds = 30
[runners.kubernetes.affinity]
[runners.kubernetes.volumes]
[runners.kubernetes.dns_config]
- Use the following gitlab-ci.yml
gitlab-ci
job:
script:
- sleep 15
- The job will fail. In the debug log, there is no reference to serviceAccount check attempts (disabled)
Preparing environment job=2675328079 project=25452826 runner=DzfSJrxx
Starting Kubernetes command with attach... job=2675328079 project=25452826 runner=DzfSJrxx
Setting up secrets job=2675328079 project=25452826 runner=DzfSJrxx
Feeding runners to channel builds=1
Setting up scripts config map job=2675328079 project=25452826 runner=DzfSJrxx
Setting up build pod job=2675328079 project=25452826 runner=DzfSJrxx
DNSPolicy string is blank, using "ClusterFirst" as default
Checking for ImagePullSecrets or ServiceAccount existence job=2675328079 project=25452826 runner=DzfSJrxx
Resources check has been disabled job=2675328079 project=25452826 runner=DzfSJrxx
Creating build pod job=2675328079 project=25452826 runner=DzfSJrxx
ERROR: Job failed (system failure): prepare environment: setting up build pod: pods "runner-dzfsjrxx-project-25452826-concurrent-0" is forbidden: error looking up service account default/cs-sa: serviceaccount "cs-sa" not found. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information duration_s=3.28418601 job=2675328079 project=25452826 runner=DzfSJrxx
- Use the following config.toml
config.toml
check_interval = 1
log_level = "debug"
[session_server]
session_timeout = 1800
[[runners]]
request_concurrency = 1
url = "https://gitlab.com/"
token = "__REDACTED__"
executor = "kubernetes"
[runners.custom_build_dir]
[runners.kubernetes]
service_account="cs-sa"
pull_policy="always"
resource_availability_check_max_attempts=3
image = "alpine:latest"
namespace_overwrite_allowed = ""
privileged = true
allow_privilege_escalation = true
service_account_overwrite_allowed = ""
pod_annotations_overwrite_allowed = ""
terminationGracePeriodSeconds = 30
[runners.kubernetes.affinity]
[runners.kubernetes.volumes]
[runners.kubernetes.dns_config]
- The job will fail. In the debug log, we see reference to serviceAccount check attempts
Preparing environment job=2675271396 project=25452826 runner=DzfSJrxx
Starting Kubernetes command with attach... job=2675271396 project=25452826 runner=DzfSJrxx
Setting up secrets job=2675271396 project=25452826 runner=DzfSJrxx
Setting up scripts config map job=2675271396 project=25452826 runner=DzfSJrxx
Feeding runners to channel builds=1
Setting up build pod job=2675271396 project=25452826 runner=DzfSJrxx
DNSPolicy string is blank, using "ClusterFirst" as default
Checking for ImagePullSecrets or ServiceAccount existence job=2675271396 project=25452826 runner=DzfSJrxx
Checking for ServiceAccount existence job=2675271396 project=25452826 runner=DzfSJrxx
Appending trace to coordinator... ok code=202 job=2675271396 job-log=0-640 job-status=running runner=DzfSJrxx sent-log=0-639 status=202 Accepted update-interval=3s
Pausing check of the ServiceAccount availability for 5000000000 (attempt 1) job=2675271396 project=25452826 runner=DzfSJrxx
Pausing check of the ServiceAccount availability for 5000000000 (attempt 2) job=2675271396 project=25452826 runner=DzfSJrxx
Pausing check of the ServiceAccount availability for 5000000000 (attempt 3) job=2675271396 project=25452826 runner=DzfSJrxx
ERROR: Job failed (system failure): prepare environment: setting up build pod: Timed out while waiting for ServiceAccount/cs-sa to be present in the cluster. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information duration_s=21.291653373 job=2675271396 project=25452826 runner=DzfSJrxx
- Use the following config.toml with default
service account
config.toml
check_interval = 1
log_level = "debug"
[session_server]
session_timeout = 1800
[[runners]]
request_concurrency = 1
url = "https://gitlab.com/"
token = "__REDACTED__"
executor = "kubernetes"
[runners.custom_build_dir]
[runners.kubernetes]
pull_policy="always"
image = "alpine:latest"
namespace_overwrite_allowed = ""
privileged = true
allow_privilege_escalation = true
service_account_overwrite_allowed = ""
pod_annotations_overwrite_allowed = ""
terminationGracePeriodSeconds = 30
[runners.kubernetes.affinity]
[runners.kubernetes.volumes]
[runners.kubernetes.dns_config]
- The job succeeds
Test permissions
Using the helm chart project, test the permissions for each mode:
Attach mode
- Service Account YAML
wi-sa.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: wi-role
rules:
- apiGroups: [""]
resources: ["serviceAccounts"]
verbs: ["get"]
- apiGroups: [""]
resources: ["pods/exec", "pods/attach"]
verbs: ["create", "patch", "delete"]
- apiGroups: [""]
resources: ["pods", "services"]
verbs: ["create", "get", "delete"]
- apiGroups: [""]
resources: ["configmaps", "secrets"]
verbs: ["create", "get", "update", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: wi-role-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: wi-role
subjects:
- kind: ServiceAccount
name: wi-sa
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: wi-sa
- values.yaml
values.yaml
image:
registry: registry.gitlab.com
image: gitlab-org/gitlab-runner
# tag: alpine-v11.6.0
imagePullPolicy: IfNotPresent
replicas: 1
gitlabUrl: https://gitlab.com/
runnerRegistrationToken: __YOUR_TOKEN_
terminationGracePeriodSeconds: 0
concurrent: 1
checkInterval: 1
logLevel: "debug"
sessionServer:
enabled: false
annotations: {}
timeout: 1800
internalPort: 8093
externalPort: 9000
# publicIP: ""
# loadBalancerSourceRanges:
# - 1.2.3.4/32
## For RBAC support:
rbac:
create: false
clusterWideAccess: false
serviceAccountName: wi-sa
serviceAccountAnnotations: {}
podSecurityPolicy:
enabled: true
resourceNames:
- gitlab-runner
metrics:
enabled: true
portName: metrics
port: 9252
serviceMonitor:
enabled: false
service:
enabled: false
type: ClusterIP
runners:
config: |
[[runners]]
[runners.kubernetes]
namespace = "{{.Release.Namespace}}"
image = "alpine"
cache: {}
builds: {}
services: {}
helpers: {}
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
runAsNonRoot: false
# privileged: false
# capabilities:
# drop: ["ALL"]
podSecurityContext:
runAsUser: 100
# runAsGroup: 65533
# fsGroup: 65533
# supplementalGroups: [65533]
## Note: values for the ubuntu image:
# runAsUser: 999
# fsGroup: 999
resources: {}
# limits:
# memory: 256Mi
# cpu: 200m
# requests:
# memory: 128Mi
# cpu: 100m
affinity: {}
nodeSelector: {}
tolerations: []
hostAliases: []
# Example:
# - ip: "127.0.0.1"
# hostnames:
# - "foo.local"
# - "bar.local"
# - ip: "10.1.2.3"
# hostnames:
# - "foo.remote"
# - "bar.remote"
podAnnotations: {}
podLabels: {}
secrets: []
configMaps: {}
volumeMounts: []
volumes: []
- Job succeeds
Exec mode
- Service Account YAML
wi-sa.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: wi-role
rules:
- apiGroups: [""]
resources: ["serviceAccounts"]
verbs: ["get"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create", "patch", "delete"]
- apiGroups: [""]
resources: ["pods", "services", "secrets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: wi-role-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: wi-role
subjects:
- kind: ServiceAccount
name: wi-sa
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: wi-sa
- values.yaml
values.yaml
image:
registry: registry.gitlab.com
image: gitlab-org/gitlab-runner
# tag: alpine-v11.6.0
imagePullPolicy: IfNotPresent
replicas: 1
gitlabUrl: https://gitlab.com/
runnerRegistrationToken: __YOUR_TOKEN_
terminationGracePeriodSeconds: 0
concurrent: 1
checkInterval: 1
logLevel: "debug"
sessionServer:
enabled: false
annotations: {}
timeout: 1800
internalPort: 8093
externalPort: 9000
# publicIP: ""
# loadBalancerSourceRanges:
# - 1.2.3.4/32
## For RBAC support:
rbac:
create: false
clusterWideAccess: false
serviceAccountName: wi-sa
serviceAccountAnnotations: {}
podSecurityPolicy:
enabled: true
resourceNames:
- gitlab-runner
metrics:
enabled: true
portName: metrics
port: 9252
serviceMonitor:
enabled: false
service:
enabled: false
type: ClusterIP
runners:
config: |
[[runners]]
environment=["FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY=true"]
[runners.kubernetes]
namespace = "{{.Release.Namespace}}"
image = "alpine"
cache: {}
builds: {}
services: {}
helpers: {}
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
runAsNonRoot: false
# privileged: false
# capabilities:
# drop: ["ALL"]
podSecurityContext:
runAsUser: 100
# runAsGroup: 65533
# fsGroup: 65533
# supplementalGroups: [65533]
## Note: values for the ubuntu image:
# runAsUser: 999
# fsGroup: 999
resources: {}
# limits:
# memory: 256Mi
# cpu: 200m
# requests:
# memory: 128Mi
# cpu: 100m
affinity: {}
nodeSelector: {}
tolerations: []
hostAliases: []
# Example:
# - ip: "127.0.0.1"
# hostnames:
# - "foo.local"
# - "bar.local"
# - ip: "10.1.2.3"
# hostnames:
# - "foo.remote"
# - "bar.remote"
podAnnotations: {}
podLabels: {}
secrets: []
configMaps: {}
volumeMounts: []
volumes: []
- Job succeeds
What are the relevant issue numbers?
Edited by Romuald Atchadé