Switch deletion propagation to background for Pod's dependents
What does this MR do?
In order to reduce the resources leftover when the job Pod is deleted, the OwnerReference
was implemented for the executorkubernetes (see !2983 (merged)). Generally speaking, when the job finishes, the job pod is successfully deleted.
However in the use case described in the issue #29291 (closed), having the Foreground
policy for the deletion propagation prevents the deletion of the job Pod which then gets stuck in Terminating
state.
There was three options IMO to handle it:
- Add a configuration to disable altogether the OwnerReference setting
- Switch from
foreground
tobackground
as the problem doesn't occur with this policy - Manually delete all the resources as it was done prior to the MR !2983 (merged)
I went for the 2nd option
. With the background
policy, according to k8s doc
In background cascading deletion, the Kubernetes API server deletes the owner object immediately and the controller cleans up the dependent objects in the background. By default, Kubernetes uses background cascading deletion unless you manually use foreground deletion or choose to orphan the dependent objects.
The k8s leftover are still deleted after the owner is removed.
Why was this MR needed?
Prevent the runner Pod to get stuck in Terminating
state
What's the best way to test this MR?
- Install the latest version of the GitLab Runner Helm Chart. This was tested with cluster on GKE. Use the following image for the test:
registry.gitlab.com/gitlab-org/gitlab-runner:alpine3.17-k8s-owner-reference-management
value.yaml
image:
registry: registry.gitlab.com
image: gitlab-org/gitlab-runner
tag: alpine3.17-k8s-owner-reference-management
useTini: false
imagePullPolicy: IfNotPresent
replicas: 1
gitlabUrl: https://gitlab.com/
runnerToken: "__REDACTED__"
unregisterRunners: true
terminationGracePeriodSeconds: 0
concurrent: 1
checkInterval: 1
logLevel: "debug"
sessionServer:
enabled: false
annotations: {}
rbac:
create: true
rules:
- apiGroups: [""]
resources: ["events", "pods", "pods/attach", "secrets", "services", "serviceAccounts"]
verbs: ["get", "list", "watch", "create", "patch", "update", "delete"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create", "patch", "delete"]
clusterWideAccess: false
podSecurityPolicy:
enabled: false
resourceNames:
- gitlab-runner
metrics:
enabled: true
portName: metrics
port: 9252
serviceMonitor:
enabled: false
service:
enabled: false
type: ClusterIP
runners:
config: |
[[runners]]
[runners.kubernetes]
image = "alpine"
memory_request = "10Mi"
builds: {}
services: {}
helpers: {}
securityContext:
allowPrivilegeEscalation: true
readOnlyRootFilesystem: false
runAsNonRoot: true
podSecurityContext:
runAsUser: 100
fsGroup: 65533
resources:
requests:
memory: 10Mi
cpu: 100m
affinity: {}
nodeSelector: {}
tolerations: []
hostAliases: []
podAnnotations: {}
podLabels: {}
hpa: {}
secrets: []
configMaps: {}
volumeMounts: []
volumes: []
- Deploy the digester webhook to your cluster
- Run any pipeline using the newly installed runner
- The job Pod should be gone once the job succeeds/fails
What are the relevant issue numbers?
Fixes #29291 (closed)