Remove usage of umask 0000 in GitLab Runner Docker executor

Release notes

This release introduces new logic that enables a user to remove umask 0000 for jobs executed with the runner docker executor.

To use the new logic and associated workflow, you will need to set the feature flag, FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR, to true. If enabled, the runner will attempt to retrieve the user identifier (UID) and group identifier (GID) of the user configured for the image used by the build container and will change the ownership of the working directory and files by running the chmod command in the predefined container.

Background

In !57 (merged) we introduced a command that runs before we start the git cloning process inside of the helper image: umask 0000, this is executed by gitlab-runner-build command which executes umask. This was done because the helper image that is used to clone the repository uses the root user. Since we use root user to clone the user project the directory ends up belonging to root root:root this isn't a problem if you are using an image with root such as alpine:3.12 if you want to write to the directory you don't face any problems. However when you want to write to the cloned directory or run a command inside that writes to that directory when an image doesn't have the root user (which is considered best practice) it fails to run any command because of permission denied which is why umask 0000 was introduced so anyone can write to it, but as explained below and in the issue comments this is not ideal because it results into everything world writeable which is unexpected by the user.

Issues the problem is causing

The newly cloned directory is world writeable

Imagine that in the pipeline we take the cloned directory and either build a docker image with the directory inside of it zip it and move it to some other server.

.gitlab-ci.yml

stages:
- root

root:
  image: alpine:3.12
  stage: root
  script:
  - echo "hello" > text.txt
  - mkdir vendor
  - echo "hello" > vendor/text.txt
  - ls -la $CI_PROJECT_DIR
  - ls -la $CI_BUILDS_DIR
  - ls -la vendor

With the current implementation, we can see that everything is 777 permissions (i.e. everyone can read, write, and execute):

job log

 Running with gitlab-runner development version (HEAD)
   on docker fL_5iHR7
Preparing the "docker" executor
00:01
 Using Docker executor with image alpine:3.12 ...
 Using locally found image version due to if-not-present pull policy
 Using docker image sha256:a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e for alpine:3.12 ...
Preparing environment
00:01
 Running on runner-fl5ihr7-project-19-concurrent-0 via steve-mbp-gitlab.local...
Getting source from Git repository
00:01
 Fetching changes with git depth set to 50...
 Reinitialized existing Git repository in /builds/root/playground/.git/
 Checking out db86a50d as umask...
 Removing text2.txt
 Skipping Git submodules setup
Executing "step_script" stage of the job script
00:01
 $ echo "hello" > text.txt
 $ mkdir vendor
 $ echo "hello" > vendor/text.txt
 $ ls -la $CI_PROJECT_DIR
 total 32
 drwxrwxrwx    5 root     root          4096 Jun 25 06:51 .
 drwxrwxrwx    4 root     root          4096 Jun 25 06:41 ..
 drwxrwxrwx    6 root     root          4096 Jun 25 06:51 .git
 drwxrwxrwx    2 root     root          4096 Jun 25 06:41 .gitlab
 -rw-rw-rw-    1 root     root           374 Jun 25 06:51 .gitlab-ci.yml
 -rw-rw-rw-    1 root     root             6 Jun 25 06:46 readonly
 -rw-r--r--    1 root     root             6 Jun 25 06:51 text.txt
 drwxr-xr-x    2 root     root          4096 Jun 25 06:51 vendor # Newly created directory has the correct default permissions
 $ ls -la $CI_BUILDS_DIR
 total 12
 drwxrwxrwx    3 root     root          4096 Jun 25 06:41 .
 drwxr-xr-x    1 root     root          4096 Jun 25 06:51 ..
 drwxrwxrwx    4 root     root          4096 Jun 25 06:41 root # This is where the project is cloned, permissions are `777` everything can write to it.
 $ ls -la vendor
 total 12
 drwxr-xr-x    2 root     root          4096 Jun 25 06:51 .
 drwxrwxrwx    5 root     root          4096 Jun 25 06:51 ..
 -rw-r--r--    1 root     root             6 Jun 25 06:51 text.txt
 Job succeeded

As we can see the project root directory is 777 so anything can write to it, which is not best practice and also it's not something obvious to the user.

When we don't run umask 0000 we get the correct permissions:

job log

 Running with gitlab-runner 13.2.0~beta.90.g24046369 (24046369)
   on docker fL_5iHR7
Preparing the "docker" executor
00:01
 Using Docker executor with image alpine:3.12 ...
 Using locally found image version due to if-not-present pull policy
 Using docker image sha256:a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e for alpine:3.12 ...
Preparing environment
00:00
 Running on runner-fl5ihr7-project-19-concurrent-0 via steve-mbp-gitlab.local...
Getting source from Git repository
00:02
 Fetching changes with git depth set to 50...
 Initialized empty Git repository in /builds/root/playground/.git/
 Created fresh repository.
 Checking out db86a50d as umask...
 Skipping Git submodules setup
Executing "step_script" stage of the job script
00:00
 $ echo "hello" > text.txt
 $ mkdir vendor
 $ echo "hello" > vendor/text.txt
 $ ls -la $CI_PROJECT_DIR
 total 32
 drwxr-xr-x    5 root     root          4096 Jun 25 07:11 .
 drwxr-xr-x    4 root     root          4096 Jun 25 07:11 ..
 drwxr-xr-x    6 root     root          4096 Jun 25 07:11 .git
 drwxr-xr-x    2 root     root          4096 Jun 25 07:11 .gitlab
 -rw-r--r--    1 root     root           374 Jun 25 07:11 .gitlab-ci.yml
 -rw-r--r--    1 root     root             6 Jun 25 07:11 readonly
 -rw-r--r--    1 root     root             6 Jun 25 07:11 text.txt
 drwxr-xr-x    2 root     root          4096 Jun 25 07:11 vendor
 $ ls -la $CI_BUILDS_DIR
 total 12
 drwxrwxrwx    3 root     root          4096 Jun 25 07:11 .
 drwxr-xr-x    1 root     root          4096 Jun 25 07:11 ..
 drwxr-xr-x    4 root     root          4096 Jun 25 07:11 root # Not everyone can write to it!
 $ ls -la vendor
 total 12
 drwxr-xr-x    2 root     root          4096 Jun 25 07:11 .
 drwxr-xr-x    5 root     root          4096 Jun 25 07:11 ..
 -rw-r--r--    1 root     root             6 Jun 25 07:11 text.txt
 Job succeeded

Issues that `umask 000` is trying to "solve"

Non-root images

As explained perfectly well by @vermeeren in #1736 (comment 358400958) is the following scenario:

So if you have one project with job A and then job B. Job A runs as root, job B runs as user, then this may happen:

Job A runs, first time execution means no build caches exist. The runner performs a "fresh clone" of the git repository. This repository environment is now owned by root:root.

Job B runs, build caches now exist for this project. The runner does a "reinitialised existing git repository", avoiding the need to re-clone everything including LFS, and tries to update via fetch and checkout to the desired branch/commit/etc.

However, because the build cache is owned by root and job B runs as user, things will break with permission denied, as user of course may not modify files owned by root.

We can see this with the following .gitlab-ci.yml:

.gitlab-ci.yml

stages:
- root
- non-root

root:
  image: alpine:3.12
  stage: root
  script:
  - echo "hello" > text.txt

non-root:
  image: registry.gitlab.com/gitlab-org/gitlab-runner/alpine-no-root:latest
  stage: non-root
  script:
  - echo "hello2" > text2.txt

git diff to remove `umask`

diff --git a/helpers/container/helperimage/linux_info.go b/helpers/container/helperimage/linux_info.go
index f52591557..d06bd2d68 100644
--- a/helpers/container/helperimage/linux_info.go
+++ b/helpers/container/helperimage/linux_info.go
@@ -15,7 +15,7 @@ const (
        archArm64       = "arm64"
 )

-var bashCmd = []string{"gitlab-runner-build"}
+var bashCmd = []string{"/bin/bash"}

 type linuxInfo struct{}

Job failed

Running with gitlab-runner 13.2.0~beta.90.g24046369 (24046369)
   on docker fL_5iHR7
Preparing the "docker" executor
00:01
 Using Docker executor with image registry.gitlab.com/gitlab-org/gitlab-runner/alpine-no-root:latest ...
 Using locally found image version due to if-not-present pull policy
 Using docker image sha256:377ff9461e933ac670101e35361588c0698e1db8fb785a046854c8e47ad992fe for registry.gitlab.com/gitlab-org/gitlab-runner/alpine-no-root:latest ...
Preparing environment
00:01
 Running on runner-fl5ihr7-project-19-concurrent-0 via steve-mbp-gitlab.local...
Getting source from Git repository
00:02
 Fetching changes with git depth set to 50...
 Reinitialized existing Git repository in /builds/root/playground/.git/
 Checking out 2f45236c as umask...
 Removing text.txt
 Skipping Git submodules setup
Executing "step_script" stage of the job script
00:00
 $ echo "hello2" > text2.txt
 /bin/sh: eval: line 90: can't create text2.txt: Permission denied
 ERROR: Job failed: exit code 1

The project is cloned by root because the helper image uses root, and it's the helper image that does the clone and without umask we can see the permissions like below:

docker run --rm -it -v runner-fl5ihr7-project-19-concurrent-1-cache-c33bcaa1fd2c77edfc3893b41966cea8:/code alpine:3.12
/ # cd /code/
/code # ls -la
total 12
drwxrwxrwx    3 root     root          4096 Jun 24 07:25 .
drwxr-xr-x    1 root     root          4096 Jun 24 07:26 ..
drwxr-xr-x    4 root     root          4096 Jun 24 07:25 root

So here someone that doesn't root can't write to it for example registry.gitlab.com/gitlab-org/gitlab-runner/alpine-no-root:latest.

Other cases that this change happen:

Using file based CI variables since we have to write variables to it.

Proposal

Before running user script, check if the user image runs as root or not. If it doesn't run as root, get the uid and guid of the user and chown the build directory.

Details

Steps

graph TD
  jobStarted[Job received] --> start

  start((...)) --> pA

  subgraph predefined steps
  pA{{Preparing predefined container}} --> hFF{Feature Flag}

  hFF --> |OFF| pB[Set umask script as container's command]
  hFF --> |ON| pC[Set `/bin/bash` script as container's command]

  pB --> pD{{Continuing with predefined steps execution}}
  pC --> pD
  end

  pD --> middle((...)) --> bA

  subgraph build steps
  bA{{Preparing build container}} --> bFF{Feature Flag}

  bFF --> |OFF| bI{{Continuing with job execution}}
  bFF --> |ON| bB(Run `docker inspect` on user's image)

  bB --> bC{Check value of `user`}
  bC --> |user == '' or user == 'root' | bD[Assuming user is root]
  bD --> bI

  bC --> |user = 'something'| bE[User specified]
  bE --> bF(Run `id -u` inside of the build container)
  bE --> bG(Run `id -g` inside of the build container)

  bF --> bH{UID==0}
  bG --> bH

  bH --> |YES| bI
  bH --> |NO| bJ(Run `chown -RP -- $UID:$GID $ROOT_DIR $PROJECT_TMP_DIR` inside of the predefined container)

  bJ --> bI
  end

  bI --> final((...))

  final --> jobFinished[Job finalized]

Implementation

Create a new feature flag to turn this ON/OFF. By default, for now, it should be OFF. We will turn it on in the future. This is to prevent any major regressions that we might cause because we missed a use case of having some invalid assumption.
Extract the execution of commands in a docker container into it's pkg, since we will be using this in multiple places. For example exec, under the internal directory.
Create user package under the internal which will do the following:
1. Check if the user is root or not. Run docker inspect on user image. If the user value is empty assume it's root.
2. Get the uid of the user by running id -u, using the exec pkg. Or parse the /etc/passwd as proposed in #1736 (comment 384347304)
3. Get the guide of the user by running id -g using the exec pkg. Or parse the /etc/group as proposed in #1736 (comment 384347304)
Using the exec package run the chown command with the specified UID:GID on the build director.
Add logging to the job trace that we are changing permission (info is enough) so users are aware.
Integration tests/Unit tests (We might want to prepare different images for this, and also use the noroot image)
If feature flag is turned ON, instead of running umask, just run /bin/bash.
We shouldn't have to touch Windows, since we don't suffer from the same problem, we have #25480

Proof of concept

A proof of concept was created in fafd9366. Note that I've don't all this in the requestBuildcontainer but this should be done somewhere where we only run it once before running user script.

.gitlab-ci.yml user for testing

variables:
  SLEEP: 0

stages:
- root
- alpine
- debian

root:
  image: alpine
  stage: root
  before_script:
  - id
  script:
  - mkdir -p test-root
  - echo "test" > test-root/test-root.txt
  - echo "test" > test-root.txt
  - sleep ${SLEEP}

alpine:
  image: steveazz/noroot-alpine
  stage: alpine
  before_script:
  - id
  script:
  - mkdir -p test-alpine
  - echo "test" > test-alpine/test-alpine.txt
  - echo "test" > test-alpine.txt
  - sleep ${SLEEP}

debian:
  image: steveazz/noroot-debian
  stage: debian
  before_script:
  - id
  script:
  - mkdir -p test-debian
  - echo "test" > test-debian/test-debian.txt
  - echo "test" > test-debian.txt
  - sleep ${SLEEP}

Dockerfiles used by the `.gitlab-ci.yml`

# Debian based image
FROM debian:buster

RUN apt-get update && apt-get install -y git

RUN groupadd -g 1001 debian
RUN useradd -ms /bin/bash -u 1001 -g 1001 debian

USER debian
WORKDIR /home/debian

# Alpine based image
FROM alpine:3.12

RUN apk add --no-cache git

RUN addgroup -S alpine -g 1000 && adduser -S alpine -G alpine -u 1000

USER alpine
WORKDIR /home/alpine

Current permissions

# runner-xi8eccda-project-20-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8 volume created by Runner to clone directories.

$ docker run --rm -it -v runner-xi8eccda-project-20-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8:/runner/build alpine:3.12

/ # cd /runner
/runner # ls -la
total 12
drwxr-xr-x    3 root     root          4096 Jul  9 08:18 .
drwxr-xr-x    1 root     root          4096 Jul  9 08:18 ..
drwxrwxrwx    3 root     root          4096 Jul  9 08:17 build                # Everyone can read/write to this.
/runner # cd build/
/runner/build # ls -la
total 12
drwxrwxrwx    3 root     root          4096 Jul  9 08:17 .
drwxr-xr-x    3 root     root          4096 Jul  9 08:18 ..
drwxrwxrwx    4 root     root          4096 Jul  9 08:17 root                  # Everyone can read/write to this.
/runner/build # cd root/
/runner/build/root # ls -la
total 16
drwxrwxrwx    4 root     root          4096 Jul  9 08:17 .
drwxrwxrwx    3 root     root          4096 Jul  9 08:17 ..
drwxrwxrwx    4 root     root          4096 Jul  9 08:17 playground
drwxrwxrwx    3 root     root          4096 Jul  9 08:17 playground.tmp       # Everyone can read/write to this.
/runner/build/root # cd playground
/runner/build/root/playground # ls -la
total 24
drwxrwxrwx    4 root     root          4096 Jul  9 08:17 .
drwxrwxrwx    4 root     root          4096 Jul  9 08:17 ..
drwxrwxrwx    6 root     root          4096 Jul  9 08:17 .git
-rw-rw-rw-    1 root     root           249 Jul  9 08:17 .gitlab-ci.yml
drwxr-xr-x    2 1000     1000          4096 Jul  9 08:17 test
-rw-r--r--    1 1000     1000             6 Jul  9 08:17 test2.txt

New permissions

# runner-xi8eccda-project-20-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8 Is the name of the volume created by the build
$ docker run --rm -it -v runner-xi8eccda-project-20-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8:/runner/build alpine:3.12
/ # cd /runner/
/runner # ls -la
total 12
drwxr-xr-x    3 root     root          4096 Jul  9 07:57 .
drwxr-xr-x    1 root     root          4096 Jul  9 07:57 ..
drwxrwxrwx    3 1000     1000          4096 Jul  9 07:38 build          # World writeable however belongs to 1000
/runner # cd build/
/runner/build # ls -la
total 12
drwxrwxrwx    3 1000     1000          4096 Jul  9 07:38 .
drwxr-xr-x    3 root     root          4096 Jul  9 07:57 ..
drwxr-xr-x    4 1000     1000          4096 Jul  9 07:38 root            # Only 1000 Can read/write to it
/runner/build # cd root/
/runner/build/root # ls -la
total 16
drwxr-xr-x    4 1000     1000          4096 Jul  9 07:38 .
drwxrwxrwx    3 1000     1000          4096 Jul  9 07:38 ..
drwxr-xr-x    4 1000     1000          4096 Jul  9 07:38 playground      # Only 1000 Can read/write to it
drwxr-xr-x    3 1000     1000          4096 Jul  9 07:38 playground.tmp  # Only 1000 Can read/write to it
/runner/build/root # cd playground
/runner/build/root/playground # ls -la
total 24
drwxr-xr-x    4 1000     1000          4096 Jul  9 07:38 .
drwxr-xr-x    4 1000     1000          4096 Jul  9 07:38 ..
drwxr-xr-x    6 1000     1000          4096 Jul  9 07:38 .git
-rw-r--r--    1 1000     1000           249 Jul  9 07:38 .gitlab-ci.yml
drwxr-xr-x    2 1000     1000          4096 Jul  9 07:38 test
-rw-r--r--    1 1000     1000             6 Jul  9 07:38 test2.txt

Possible follow up issues that we can create

For the cache/build volumes we run chmod 777 on the root directory with permissions for #25440 (closed). We might want to investigate if we can do something similar, although I don't see that we will get much benefit from doing so.

Other proposals considered

Not git running inside of the helper image. A lot of work, which might cause more problems because we lose control over the environment which we run our git commands. Also we would have to figure out how to statically link everything.
Using Docker additional groups. The permissions are still a bit open, and it's not always the case that root is available.

Planning log

2020-06-08

Planning log: 2020-04-08

I've read through all this issue and related issues/merge requests and have a better understanding of the problem however still not the full picture. I'm going to try and summarize the problem form my own sake and list out all the possible solutions for me to better understand the problem and trying to be transparent as possible for the current progress.

Problem

We run umask 0000 to make the world writeable to anyone. When we clone the repository we end up overriding the permissions it has set to everything being writable by the world. This, of course, is less than ideal because basically anything can write to it and these permissions are preserved if the user ends up deploying the resulting state to some server the user ends up with the same insecure permissions.

This was introduced in !57 (merged) to allow non-root images to work correctly.

Solutions

Any solution that is going to be implemented will end up being a breaking change so it's something we need to keep in mind, warn users, and think of a properly rollout strategy.

It seems like the ideal solution would be:

Change to umask 0022 which is a sane default.
Allow user to change this inside of .gitlab-ci.yml to set the uid and guid correctly so that the user has the correct permissions to change something. (This is the part I'm still hazy about how to implement and why we need it)
Show a warning to the user when using umask 0000 or the problems that this might cause.
Update documentation properly to educate users about this problem inside of the Runner and how to enable the correct fix.

Action points for me

Create tests scenarios (like non-root image) where things change break for umask 0022 and see how setting the correct uid/guid
Read up on #3188 (comment 124544958)
Create a merge request to document current behavior.
Does this somehow relate to !2047 (merged)?

2020-06-24

Spend time trying to get a reproducible environment/testing environment.
Start writing down proposal

2020-07-01

#1736 (comment 371832429)

2020-07-07

I've pushed a small PoC in 3cc4e043 which fixes this issue by running chown -RP $UID:$GID on the build directory if the container doesn't use the root user.

What I'm thinking for now:

Print log message when we change the permissions
Behind feature flag
We still mess with the permissions, are we OK with that?
So we always have the feature flag that the user turned on when needed?

Original report from the community member

Overview

I'm assuming that this is not a bug, but maybe someone can tell me the rationale behind this decision.

We're currently re-building our continuous delivery setup on GitLab/Gitlab CI and noticed that file permissions are odd when deploying through Gitlab CI. When gitlab-ci-multi-runner checks out a repository inside a docker container, it appears to be doing so with umask 0000, with the result that everything that's checked out is world-writable. This is not a problem inside the container, but since we deploy from inside the container, the permissions are also transferred to the production system.

It seems like this behavior cannot be influenced, since before_script runs after the repository has been cloned. To verify that my assumption about the umask was correct, I threw in a w.Command("umask", "0022") in abstract.go L61, which resulted in the correct (for me) file permissions being set.

So my questions are:

Is this intended behavior, and if so, what is the rationale?
Can this be influenced in any way?
- If it is not, I'd be willing to submit a merge request providing such functionality after discussing what the best approach would be to do so

Thanks for your time :)

Edited Dec 10, 2020 by Darren Eastman