Docker in Docker 19.03 service fails
Summary
CE and EE jobs are are failing with an error like:
docker: Cannot connect to the Docker daemon at tcp://docker:2375. Is the docker daemon running?.
E.g.:
- https://gitlab.com/gitlab-org/gitlab-ee/-/jobs/256902084
- https://gitlab.com/gitlab-org/gitlab-ce/-/jobs/256907818
This is also affecting customers. Presumably it will affect anyone using docker:stable-dind
on our runners.
Details
Docker has released a new version 19.03 https://hub.docker.com/_/docker?tab=tags which enabled TLS by default.
Starting in 18.09+, the dind variants of this image will automatically generate TLS certificates in the directory specified by the DOCKER_TLS_CERTDIR environment variable.
Warning: in 18.09, this behavior is disabled by default (for compatibility). If you use --network=host, shared network namespaces (as in Kubernetes pods), or otherwise have network access to the container (including containers started within the dind instance via their gateway interface), this is a potential security issue (which can lead to access to the host system, for example). It is recommended to enable TLS by setting the variable to an appropriate value (-e DOCKER_TLS_CERTDIR=/certs or similar). In 19.03+, this behavior is enabled by default.
This means that when the service starts it will try and create the certificates, which Gitlab Runner doesn't seem to accept this.
Notes
With the workaround below you still might see errors like the service not starting, but your job still succeeds, @tmaczukin left a detailed explanation why this happens in #4501 (comment 195033385)
Workaround
Support TLS
With 19.03 TLS is enabled by default, to use TLS you need to update the GitLab Runner configuration so that the certificates are shared between the service and build container, do this this update your config.toml
to look something like below:
[[runners]]
name = "My Docker Runner"
url = "http://127.0.0.1:3000/"
token = "oXA2AxcKb8mdGEUrB-3L"
executor = "docker"
[runners.custom_build_dir]
[runners.docker]
tls_verify = false
image = "docker:stable"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/certs/client", "/cache"] #<-------------- Notice the extra mount to /certs/client
shm_size = 0
Then you need to update your .gitlab-ci.yml
file to explicitly specify that you are the certificates to be generated in a specific path
image: docker:19.03
variables:
# When using dind service we need to instruct docker, to talk with
# the daemon started inside of the service. The daemon is
# available with a network connection instead of the default
# /var/run/docker.sock socket. docker:19.03-dind does this
# automatically by setting the DOCKER_HOST in
# https://github.com/docker-library/docker/blob/d45051476babc297257df490d22cbd806f1b11e4/19.03/docker-entrypoint.sh#L23-L29
#
# The 'docker' hostname is the alias of the service container as described at
# https://docs.gitlab.com/ee/ci/docker/using_docker_images.html#accessing-the-services.
#
# Note that if you're using the Kubernetes executor, the variable should be set to
# tcp://localhost:2376/ because of how the Kubernetes executor connects services
# to the job container
# DOCKER_HOST: tcp://localhost:2376/
#
# When using dind, it's wise to use the overlayfs driver for
# improved performance.
DOCKER_DRIVER: overlay2
# Specify to Docker where to create the certificates, Docker will
# create them automatically on boot, and will create
# `/certs/client` that will be shared between the service and
# build container.
DOCKER_TLS_CERTDIR: "/certs"
services:
- docker:19.03-dind
before_script:
- docker info
build:
stage: build
script:
- docker build -t my-docker-image .
- docker run my-docker-image /script/to/run/tests
Disable TLS
Set DOCKER_TLS_CERTDIR=
as an environment variable to disable TLS, this can be done if a few ways:
config.toml
# config.toml
[[runners]]
environment = ["DOCKER_TLS_CERTDIR="]
Per job
# .gitlab-ci.yml
variables:
DOCKER_TLS_CERTDIR: ""
Use older Docker in Docker image
variables:
DOCKER_HOST: tcp://docker:2375/
DOCKER_DRIVER: overlay2
services:
- docker:18.09-dind
Proposal
- Update documentation on how to use Docker in Docker with 19.03. This is done in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31051
- Make the new job template specified in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31051 work for the shared runners, more detailed can be found in #4501 (comment 195155325)
- Enable this on
gitlab-runner-builder.gitlap.com
- Chef: https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/1473 (Internal Link only)
-
.gitlab-ci.yml
change: !1495 (closed) - Verified Jobs:
- Using old template: https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/258373151 jobs is not using TLS
- Using new template: https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/258373143 job is usin TLS
- Enable on
prmX
- Chef: https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/1488 (Internal Link only) and https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/1494 (Internal link only)
-
.gitlab-ci.yml
change: !1495 (closed) - Verified Job:
- Using old template: https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/258469343
- Using new template: https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/258469342
- Enable on
srmX
- Chef: https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/1489 (Internal Link only )
-
.gitlab-ci.yml
change: !1495 (closed) - Verified Job:
- Using old template: https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/259083787
- Using new template: https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/259083672
- Enable this on
Timeline
- 2019-07-23 04:50 UTC - Applied a config change to the shared Runner fleet to disable TLS for now, more information in gitlab-com/gl-infra/production#982 (closed)
- 2019-07-24 - Update documentation in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31051
- 2019-07-24 14:15 UTC - In
gitlab-runner-builder.gitlap.com
,/certs/client
is mounted for both the service and build container. Verified that it's working https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/258373143 - 2019-07-24 13:00 UTC - In private runner manager,
/certs/client
is mounted for both the service and build container. Verified that it's working https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/258469342 - 2019-07-25 10:09 UTC - Shared Runner mount
/certs/client
to service and build container.