Ignored kubernetes error in review-cleanup should not cause the job to abort
When one k8s cleanup action fails and we are ignoring the error, it should not abort the remaining delete_by_matching_name
.
https://gitlab.com/gitlab-org/gitlab/-/jobs/440612249#L399
Running command: `kubectl delete ingress,svc,pdb,hpa,deploy,statefulset,job,pod,secret,configmap,pvc,secret,clusterrole,clusterrolebinding,role,rolebinding,sa,crd --namespace "review-apps-ee" --now --ignore-not-found --include-uninitialized --wait=false -l 'release in (review-10429-set-xri0sp)'`
Running command: `kubectl get ingress,svc,pdb,hpa,deploy,statefulset,job,pod,secret,configmap,pvc,secret,clusterrole,clusterrolebinding,role,rolebinding,sa,crd --namespace "review-apps-ee" -o name`
Running command: `kubectl delete --namespace "review-apps-ee" pod/review-10429-set-xri0sp-gitaly-0`
Running command: `kubectl delete --namespace "review-apps-ee" pod/review-10429-set-xri0sp-gitlab-shell-949c457b6-9qmm4`
Ignoring the following Kubernetes error:
The `kubectl delete --namespace "review-apps-ee" pod/review-10429-set-xri0sp-gitlab-shell-949c457b6-9qmm4` command failed (status: pid 369 exit 1) with the following error:
Error from server (NotFound): pods "review-10429-set-xri0sp-gitlab-shell-949c457b6-9qmm4" not found
In this example, it retrieves outstanding resources by name, and then attempt to delete it one by one. At this point, the previous kubectl delete pod
using label selector might have started a pod delete, but the pod is still in process of TERMINATING
, thus showing up in the kubectl get ... -o name
command. By the time we get to kubectl delete
this pod, the termination has completed and kubectl couldn't find the pod. It then aborts the remaining delete_by_matching_name
.
Edited by Albert Salim