Gitlab Runner not able to resolve Gitlab URL - Kubernetes Executor
Status update: 2021-05-13
-
We believe the root cause to be the alpine DNS issue for the runner-helper.
-
This MR is in-flight to provide a ubuntu flavor of the runner-helper image and is at the maintainer review stage.
Eventually, user's will hopefully be able to set:
helper_image_flavor = "ubuntu"
in their runner toml config and this issue will hopefully disappear.
Summary
Gitlab Runner is not able to resolve Gitlab URL. When gitlab & gitlab runners are deployed in kubernetes cluster, gitlab runner intermittently fails with following:
Running with gitlab-runner 10.8.0 (079aad9e)
on big-runner-gitlab-runner-7bd85f8f65-qqvpj f533f16d
Using Kubernetes namespace: gitlab-runner
Using Kubernetes executor with image $IMAGE ...
Waiting for pod gitlab-runner/runner-f533f16d-project-38-concurrent-4ttnck to be running, status is Pending
Waiting for pod gitlab-runner/runner-f533f16d-project-38-concurrent-4ttnck to be running, status is Pending
Waiting for pod gitlab-runner/runner-f533f16d-project-38-concurrent-4ttnck to be running, status is Pending
Running on runner-f533f16d-project-38-concurrent-4ttnck via big-runner-gitlab-runner-7bd85f8f65-qqvpj...
Cloning repository for master with git depth set to 20...
Cloning into '/repo/proj'...
fatal: unable to access 'https://gitlab-ci-token:xxxxxxxxxxxxxxxxxxxx@code.repo.io/repo/proj.git/': Could not resolve host: code.repo.io
/bin/bash: line 114: cd: /repo/proj: No such file or directory
ERROR: Job failed: error executing remote command: command terminated with non-zero exit code: Error executing in Docker Container: 1
Most of the times it's able to resolve domain name, and it's continues successfully. However, sometimes it fails with above error. It has became quite a pain to retry these jobs.
How can I debug what's the issue here? I tried debugging kubernetes DNS. I created a pod every 1 minute and tried to resolve url. It seemed to resolve always. I checked if my DNS server (AWS) is throttling queries, I was able to query 1000 QPS, didn't fail.
I added an extra dot in url: https://code.repo.io.
as mentioned here didn't help.
Steps to reproduce
- Deploy gitlab on kubernetes, via helm chart: gitlab
- Deploy gitlab-runner on kubernetes, via helm chart: gitlab-runner
- Create a project in gitlab, with
.gitlab-ci.yaml
- Run job in CI/CD. Clone fails.
What is the current bug behavior?
Runner job fails with:
fatal: unable to access 'https://gitlab-ci-token:xxxxxxxxxxxxxxxxxxxx@code.repo.io/repo/proj.git/': Could not resolve host: code.repo.io
What is the expected correct behavior?
Clone shouldn't fail due to above failure.