Mixed Kubernetese windows runner stuck in runner_script_trap
Summary
On a mixed Kubernetes cluster (Linux/Windows) the runner that handles windows pipelines hangs. It will create its two pods (the helper and build) and they come online. The helper goes into {"command_exit_code": 0, "script": "runner_script_trap"}
while the build container does nothing. The output in Gitlab appends a few empty lines to its log and then waits for a timeout to eventually kill the pipeline.
Steps to reproduce
- Create a mixed kubernetes cluster. for local testing I have:
NAME STATUS ROLES AGE VERSION
dev-builder Ready control-plane,master 33d v1.21.0
dev-builder-win Ready <none> 4d19h v1.21.1
dev-builder doubles as main server, it runs Ubuntu server 20.04. Kubernetes is installed on top of cri-o dev-builder-win is a worker node running on windows server 2019 (10.0.17763.1999) with Kubernetes on containrD I connected the two in a cluster following the Kubernetes manual
-
Create a namespace in kubernetes called
gitlab
-
Deploy a Gitlab runner to the cluster using helm following the instructions from the Gitlab manual while using the following
values.yaml
:
name: "Runner-Windows"
gitlabUrl: https://git.mydomain.com
runnerRegistrationToken: thetoken
certsSecretName: certname
log-level: info
tags: "windows"
envVars:
- name: CI_SERVER_TLS_CA_FILE
value: /home/gitlab-runner/.gitlab-runner/certs/mydomain.com.crt
clusterWideAccess: true
serviceAccountName: gitlab
rbac:
create: true
rules:
- resources: ["pods", "secrets"]
verbs: ["get", "list", "watch", "create", "patch", "delete"]
- apiGroups: [""]
resources: ["pods/exec", "configmaps", "pods/attach", "secrets"]
verbs: ["create", "patch", "delete", "update"]
nodeSelector:
kubernetes.io/os: "linux"
runners:
environment: ["FF_USE_POWERSHELL_PATH_RESOLVER=1"]
nodeSelector:
kubernetes.io/os: "windows"
node.kubernetes.io/windows-build: "10.0.17763"
kubernetes.io/arch: "amd64"
- In gitlab label the new runner as 'windows'. (you might also need to increase the timeout, pulling windows images tends to exceed the default timeout)
- Create a repository. Add the following ci file:
image: mcr.microsoft.com/dotnet/sdk:5.0-windowsservercore-ltsc2019
stages:
- hello
hello:
tags:
- windows
stage: hello
script:
- "echo hello"
- run the pipeline. It will spin up the images but then hang until the timeout
Actual behavior
The pipeline starts a new pod with 2 containers (helper
and build
). The job will register the containers and some random white spacing is dropped in the pipeline log. After that the build freezes until it gets killed by a timeout (default 1h).
logs for pipeline and kubectl logs attached below
There is no log output for the build
container
Expected behavior
pipeline starts, echo's hello
on a windows image, pipeline succeeds
Relevant logs and/or screenshots
pipeline log
Running with gitlab-runner 14.2.0 (58ba2b95)
on gitlab-runner-windows-gitlab-runner-64fcbbdf57-xdnkt USaiYpXT
feature flags: FF_USE_POWERSHELL_PATH_RESOLVER:true
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: gitlab
Using Kubernetes executor with image mcr.microsoft.com/dotnet/sdk:5.0-windowsservercore-ltsc2019 ...
Using attach strategy to execute scripts...
Preparing environment
Waiting for pod gitlab/runner-usaiypxt-project-571-concurrent-0c6htl to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-usaiypxt-project-571-concurrent-0c6htl to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-usaiypxt-project-571-concurrent-0c6htl to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-usaiypxt-project-571-concurrent-0c6htl to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-usaiypxt-project-571-concurrent-0c6htl to be running, status is Pending
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-usaiypxt-project-571-concurrent-0c6htl to be running, status is Pending
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
ERROR: Job failed: execution took longer than 1h0m0s seconds
the empty lines with just a \t
on them are not a typo, they get written out once the containers are both running
helper container logs
PS C:\git\gitlabrunnerconfig> kubectl logs runner-usaiypxt-project-571-concurrent-0b7n8l --namespace=gitlab -c helper
Running on RUNNER-USAIYPXT via
gitlab-runner-windows-gitlab-runner-64fcbbdf57-xdnkt...
{"command_exit_code": 0, "script": "runner_script_trap"}
Environment description
- gitlab version: 14.2.3
- runner version: 14.2.0
- cluster info:
NAME STATUS ROLES AGE VERSION
dev-builder Ready control-plane,master 33d v1.21.0
dev-builder-win Ready <none> 4d19h v1.21.1
There is also a second Gitlab runner active on the cluster to targets Linux. Those builds have no problems