Remove usage of umask 0000 in GitLab Runner Docker executor
Release notes
This release introduces new logic that enables a user to remove umask 0000
for jobs executed with the runner docker executor
.
To use the new logic and associated workflow, you will need to set the feature flag, FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR
, to true. If enabled, the runner will attempt to retrieve the user identifier (UID) and group identifier (GID) of the user configured for the image used by the build container and will change the ownership of the working directory and files by running the chmod
command in the predefined container.
Background
In !57 (merged) we introduced a command that runs before we start the git cloning process inside of the helper image: umask 0000
, this is executed by gitlab-runner-build
command which executes umask. This was done because the helper image that is used to clone the repository uses the root
user. Since we use root
user to clone the user project the directory ends up belonging to root root:root
this isn't a problem if you are using an image with root
such as alpine:3.12
if you want to write to the directory you don't face any problems. However when you want to write to the cloned directory or run a command inside that writes to that directory when an image doesn't have the root user (which is considered best practice) it fails to run any command because of permission denied which is why umask 0000
was introduced so anyone can write to it, but as explained below and in the issue comments this is not ideal because it results into everything world writeable which is unexpected by the user.
Issues the problem is causing
The newly cloned directory is world writeable
Imagine that in the pipeline we take the cloned directory and either build a docker image with the directory inside of it zip it and move it to some other server.
.gitlab-ci.yml
stages:
- root
root:
image: alpine:3.12
stage: root
script:
- echo "hello" > text.txt
- mkdir vendor
- echo "hello" > vendor/text.txt
- ls -la $CI_PROJECT_DIR
- ls -la $CI_BUILDS_DIR
- ls -la vendor
With the current implementation, we can see that everything is 777
permissions (i.e. everyone can read, write, and execute):
job log
Running with gitlab-runner development version (HEAD)
on docker fL_5iHR7
Preparing the "docker" executor
00:01
Using Docker executor with image alpine:3.12 ...
Using locally found image version due to if-not-present pull policy
Using docker image sha256:a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e for alpine:3.12 ...
Preparing environment
00:01
Running on runner-fl5ihr7-project-19-concurrent-0 via steve-mbp-gitlab.local...
Getting source from Git repository
00:01
Fetching changes with git depth set to 50...
Reinitialized existing Git repository in /builds/root/playground/.git/
Checking out db86a50d as umask...
Removing text2.txt
Skipping Git submodules setup
Executing "step_script" stage of the job script
00:01
$ echo "hello" > text.txt
$ mkdir vendor
$ echo "hello" > vendor/text.txt
$ ls -la $CI_PROJECT_DIR
total 32
drwxrwxrwx 5 root root 4096 Jun 25 06:51 .
drwxrwxrwx 4 root root 4096 Jun 25 06:41 ..
drwxrwxrwx 6 root root 4096 Jun 25 06:51 .git
drwxrwxrwx 2 root root 4096 Jun 25 06:41 .gitlab
-rw-rw-rw- 1 root root 374 Jun 25 06:51 .gitlab-ci.yml
-rw-rw-rw- 1 root root 6 Jun 25 06:46 readonly
-rw-r--r-- 1 root root 6 Jun 25 06:51 text.txt
drwxr-xr-x 2 root root 4096 Jun 25 06:51 vendor # Newly created directory has the correct default permissions
$ ls -la $CI_BUILDS_DIR
total 12
drwxrwxrwx 3 root root 4096 Jun 25 06:41 .
drwxr-xr-x 1 root root 4096 Jun 25 06:51 ..
drwxrwxrwx 4 root root 4096 Jun 25 06:41 root # This is where the project is cloned, permissions are `777` everything can write to it.
$ ls -la vendor
total 12
drwxr-xr-x 2 root root 4096 Jun 25 06:51 .
drwxrwxrwx 5 root root 4096 Jun 25 06:51 ..
-rw-r--r-- 1 root root 6 Jun 25 06:51 text.txt
Job succeeded
As we can see the project root directory is 777
so anything can write to it, which is not best practice and also it's not something obvious to the user.
When we don't run umask 0000
we get the correct permissions:
job log
Running with gitlab-runner 13.2.0~beta.90.g24046369 (24046369)
on docker fL_5iHR7
Preparing the "docker" executor
00:01
Using Docker executor with image alpine:3.12 ...
Using locally found image version due to if-not-present pull policy
Using docker image sha256:a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e for alpine:3.12 ...
Preparing environment
00:00
Running on runner-fl5ihr7-project-19-concurrent-0 via steve-mbp-gitlab.local...
Getting source from Git repository
00:02
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/root/playground/.git/
Created fresh repository.
Checking out db86a50d as umask...
Skipping Git submodules setup
Executing "step_script" stage of the job script
00:00
$ echo "hello" > text.txt
$ mkdir vendor
$ echo "hello" > vendor/text.txt
$ ls -la $CI_PROJECT_DIR
total 32
drwxr-xr-x 5 root root 4096 Jun 25 07:11 .
drwxr-xr-x 4 root root 4096 Jun 25 07:11 ..
drwxr-xr-x 6 root root 4096 Jun 25 07:11 .git
drwxr-xr-x 2 root root 4096 Jun 25 07:11 .gitlab
-rw-r--r-- 1 root root 374 Jun 25 07:11 .gitlab-ci.yml
-rw-r--r-- 1 root root 6 Jun 25 07:11 readonly
-rw-r--r-- 1 root root 6 Jun 25 07:11 text.txt
drwxr-xr-x 2 root root 4096 Jun 25 07:11 vendor
$ ls -la $CI_BUILDS_DIR
total 12
drwxrwxrwx 3 root root 4096 Jun 25 07:11 .
drwxr-xr-x 1 root root 4096 Jun 25 07:11 ..
drwxr-xr-x 4 root root 4096 Jun 25 07:11 root # Not everyone can write to it!
$ ls -la vendor
total 12
drwxr-xr-x 2 root root 4096 Jun 25 07:11 .
drwxr-xr-x 5 root root 4096 Jun 25 07:11 ..
-rw-r--r-- 1 root root 6 Jun 25 07:11 text.txt
Job succeeded
umask 000
is trying to "solve"
Issues that Non-root images
As explained perfectly well by @vermeeren in #1736 (comment 358400958) is the following scenario:
So if you have one project with job A and then job B. Job A runs as
root
, job B runs asuser
, then this may happen:
- Job A runs, first time execution means no build caches exist. The runner performs a "fresh clone" of the git repository. This repository environment is now owned by
root:root
.- Job B runs, build caches now exist for this project. The runner does a "reinitialised existing git repository", avoiding the need to re-clone everything including LFS, and tries to update via fetch and checkout to the desired branch/commit/etc.
- However, because the build cache is owned by
root
and job B runs asuser
, things will break with permission denied, asuser
of course may not modify files owned byroot
.
We can see this with the following .gitlab-ci.yml
:
.gitlab-ci.yml
stages:
- root
- non-root
root:
image: alpine:3.12
stage: root
script:
- echo "hello" > text.txt
non-root:
image: registry.gitlab.com/gitlab-org/gitlab-runner/alpine-no-root:latest
stage: non-root
script:
- echo "hello2" > text2.txt
git diff to remove `umask`
diff --git a/helpers/container/helperimage/linux_info.go b/helpers/container/helperimage/linux_info.go
index f52591557..d06bd2d68 100644
--- a/helpers/container/helperimage/linux_info.go
+++ b/helpers/container/helperimage/linux_info.go
@@ -15,7 +15,7 @@ const (
archArm64 = "arm64"
)
-var bashCmd = []string{"gitlab-runner-build"}
+var bashCmd = []string{"/bin/bash"}
type linuxInfo struct{}
Job failed
Running with gitlab-runner 13.2.0~beta.90.g24046369 (24046369)
on docker fL_5iHR7
Preparing the "docker" executor
00:01
Using Docker executor with image registry.gitlab.com/gitlab-org/gitlab-runner/alpine-no-root:latest ...
Using locally found image version due to if-not-present pull policy
Using docker image sha256:377ff9461e933ac670101e35361588c0698e1db8fb785a046854c8e47ad992fe for registry.gitlab.com/gitlab-org/gitlab-runner/alpine-no-root:latest ...
Preparing environment
00:01
Running on runner-fl5ihr7-project-19-concurrent-0 via steve-mbp-gitlab.local...
Getting source from Git repository
00:02
Fetching changes with git depth set to 50...
Reinitialized existing Git repository in /builds/root/playground/.git/
Checking out 2f45236c as umask...
Removing text.txt
Skipping Git submodules setup
Executing "step_script" stage of the job script
00:00
$ echo "hello2" > text2.txt
/bin/sh: eval: line 90: can't create text2.txt: Permission denied
ERROR: Job failed: exit code 1
The project is cloned by root
because the helper image uses root, and it's the helper image that does the clone and without umask
we can see the permissions like below:
docker run --rm -it -v runner-fl5ihr7-project-19-concurrent-1-cache-c33bcaa1fd2c77edfc3893b41966cea8:/code alpine:3.12
/ # cd /code/
/code # ls -la
total 12
drwxrwxrwx 3 root root 4096 Jun 24 07:25 .
drwxr-xr-x 1 root root 4096 Jun 24 07:26 ..
drwxr-xr-x 4 root root 4096 Jun 24 07:25 root
So here someone that doesn't root can't write to it for example registry.gitlab.com/gitlab-org/gitlab-runner/alpine-no-root:latest
.
Other cases that this change happen:
- Using
file
based CI variables since we have to write variables to it.
Proposal
Before running user script, check if the user image runs as root
or not. If it doesn't run as root, get the uid
and guid
of the user and chown
the build directory.
Details
Steps
graph TD
jobStarted[Job received] --> start
start((...)) --> pA
subgraph predefined steps
pA{{Preparing predefined container}} --> hFF{Feature Flag}
hFF --> |OFF| pB[Set umask script as container's command]
hFF --> |ON| pC[Set `/bin/bash` script as container's command]
pB --> pD{{Continuing with predefined steps execution}}
pC --> pD
end
pD --> middle((...)) --> bA
subgraph build steps
bA{{Preparing build container}} --> bFF{Feature Flag}
bFF --> |OFF| bI{{Continuing with job execution}}
bFF --> |ON| bB(Run `docker inspect` on user's image)
bB --> bC{Check value of `user`}
bC --> |user == '' or user == 'root' | bD[Assuming user is root]
bD --> bI
bC --> |user = 'something'| bE[User specified]
bE --> bF(Run `id -u` inside of the build container)
bE --> bG(Run `id -g` inside of the build container)
bF --> bH{UID==0}
bG --> bH
bH --> |YES| bI
bH --> |NO| bJ(Run `chown -RP -- $UID:$GID $ROOT_DIR $PROJECT_TMP_DIR` inside of the predefined container)
bJ --> bI
end
bI --> final((...))
final --> jobFinished[Job finalized]
Implementation
- Create a new feature flag to turn this ON/OFF. By default, for now, it should be OFF. We will turn it on in the future. This is to prevent any major regressions that we might cause because we missed a use case of having some invalid assumption.
- Extract the execution of commands in a docker container into it's pkg, since we will be using this in multiple places. For example
exec
, under the internal directory. - Create
user
package under the internal which will do the following:- Check if the user is root or not. Run
docker inspect
on user image. If theuser
value is empty assume it's root. - Get the uid of the user by running
id -u
, using the exec pkg. Or parse the/etc/passwd
as proposed in #1736 (comment 384347304) - Get the guide of the user by running
id -g
using the exec pkg. Or parse the/etc/group
as proposed in #1736 (comment 384347304)
- Check if the user is root or not. Run
- Using the
exec
package run thechown
command with the specifiedUID:
GID on the build director. - Add logging to the job trace that we are changing permission (info is enough) so users are aware.
- Integration tests/Unit tests (We might want to prepare different images for this, and also use the noroot image)
- If feature flag is turned ON, instead of running
umask
, just run/bin/bash
. - We shouldn't have to touch Windows, since we don't suffer from the same problem, we have #25480
Proof of concept
A proof of concept was created in fafd9366. Note that I've don't all this in the requestBuildcontainer
but this should be done somewhere where we only run it once before running user script.
.gitlab-ci.yml user for testing
variables:
SLEEP: 0
stages:
- root
- alpine
- debian
root:
image: alpine
stage: root
before_script:
- id
script:
- mkdir -p test-root
- echo "test" > test-root/test-root.txt
- echo "test" > test-root.txt
- sleep ${SLEEP}
alpine:
image: steveazz/noroot-alpine
stage: alpine
before_script:
- id
script:
- mkdir -p test-alpine
- echo "test" > test-alpine/test-alpine.txt
- echo "test" > test-alpine.txt
- sleep ${SLEEP}
debian:
image: steveazz/noroot-debian
stage: debian
before_script:
- id
script:
- mkdir -p test-debian
- echo "test" > test-debian/test-debian.txt
- echo "test" > test-debian.txt
- sleep ${SLEEP}
Dockerfiles used by the `.gitlab-ci.yml`
# Debian based image
FROM debian:buster
RUN apt-get update && apt-get install -y git
RUN groupadd -g 1001 debian
RUN useradd -ms /bin/bash -u 1001 -g 1001 debian
USER debian
WORKDIR /home/debian
# Alpine based image
FROM alpine:3.12
RUN apk add --no-cache git
RUN addgroup -S alpine -g 1000 && adduser -S alpine -G alpine -u 1000
USER alpine
WORKDIR /home/alpine
Current permissions
# runner-xi8eccda-project-20-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8 volume created by Runner to clone directories.
$ docker run --rm -it -v runner-xi8eccda-project-20-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8:/runner/build alpine:3.12
/ # cd /runner
/runner # ls -la
total 12
drwxr-xr-x 3 root root 4096 Jul 9 08:18 .
drwxr-xr-x 1 root root 4096 Jul 9 08:18 ..
drwxrwxrwx 3 root root 4096 Jul 9 08:17 build # Everyone can read/write to this.
/runner # cd build/
/runner/build # ls -la
total 12
drwxrwxrwx 3 root root 4096 Jul 9 08:17 .
drwxr-xr-x 3 root root 4096 Jul 9 08:18 ..
drwxrwxrwx 4 root root 4096 Jul 9 08:17 root # Everyone can read/write to this.
/runner/build # cd root/
/runner/build/root # ls -la
total 16
drwxrwxrwx 4 root root 4096 Jul 9 08:17 .
drwxrwxrwx 3 root root 4096 Jul 9 08:17 ..
drwxrwxrwx 4 root root 4096 Jul 9 08:17 playground
drwxrwxrwx 3 root root 4096 Jul 9 08:17 playground.tmp # Everyone can read/write to this.
/runner/build/root # cd playground
/runner/build/root/playground # ls -la
total 24
drwxrwxrwx 4 root root 4096 Jul 9 08:17 .
drwxrwxrwx 4 root root 4096 Jul 9 08:17 ..
drwxrwxrwx 6 root root 4096 Jul 9 08:17 .git
-rw-rw-rw- 1 root root 249 Jul 9 08:17 .gitlab-ci.yml
drwxr-xr-x 2 1000 1000 4096 Jul 9 08:17 test
-rw-r--r-- 1 1000 1000 6 Jul 9 08:17 test2.txt
New permissions
# runner-xi8eccda-project-20-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8 Is the name of the volume created by the build
$ docker run --rm -it -v runner-xi8eccda-project-20-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8:/runner/build alpine:3.12
/ # cd /runner/
/runner # ls -la
total 12
drwxr-xr-x 3 root root 4096 Jul 9 07:57 .
drwxr-xr-x 1 root root 4096 Jul 9 07:57 ..
drwxrwxrwx 3 1000 1000 4096 Jul 9 07:38 build # World writeable however belongs to 1000
/runner # cd build/
/runner/build # ls -la
total 12
drwxrwxrwx 3 1000 1000 4096 Jul 9 07:38 .
drwxr-xr-x 3 root root 4096 Jul 9 07:57 ..
drwxr-xr-x 4 1000 1000 4096 Jul 9 07:38 root # Only 1000 Can read/write to it
/runner/build # cd root/
/runner/build/root # ls -la
total 16
drwxr-xr-x 4 1000 1000 4096 Jul 9 07:38 .
drwxrwxrwx 3 1000 1000 4096 Jul 9 07:38 ..
drwxr-xr-x 4 1000 1000 4096 Jul 9 07:38 playground # Only 1000 Can read/write to it
drwxr-xr-x 3 1000 1000 4096 Jul 9 07:38 playground.tmp # Only 1000 Can read/write to it
/runner/build/root # cd playground
/runner/build/root/playground # ls -la
total 24
drwxr-xr-x 4 1000 1000 4096 Jul 9 07:38 .
drwxr-xr-x 4 1000 1000 4096 Jul 9 07:38 ..
drwxr-xr-x 6 1000 1000 4096 Jul 9 07:38 .git
-rw-r--r-- 1 1000 1000 249 Jul 9 07:38 .gitlab-ci.yml
drwxr-xr-x 2 1000 1000 4096 Jul 9 07:38 test
-rw-r--r-- 1 1000 1000 6 Jul 9 07:38 test2.txt
Possible follow up issues that we can create
For the cache/build volumes we run chmod 777 on the root directory with permissions for #25440 (closed). We might want to investigate if we can do something similar, although I don't see that we will get much benefit from doing so.
Other proposals considered
- Not git running inside of the helper image. A lot of work, which might cause more problems because we lose control over the environment which we run our git commands. Also we would have to figure out how to statically link everything.
-
Using Docker additional groups. The permissions are still a bit open, and it's not always the case that
root
is available.
Planning log
2020-06-08
Planning log: 2020-04-08
I've read through all this issue and related issues/merge requests and have a better understanding of the problem however still not the full picture. I'm going to try and summarize the problem form my own sake and list out all the possible solutions for me to better understand the problem and trying to be transparent as possible for the current progress.
Problem
We run umask 0000
to make the world writeable to anyone. When we clone the repository we end up overriding the permissions it has set to everything being writable by the world. This, of course, is less than ideal because basically anything can write to it and these permissions are preserved if the user ends up deploying the resulting state to some server the user ends up with the same insecure permissions.
This was introduced in !57 (merged) to allow non-root images to work correctly.
Solutions
Any solution that is going to be implemented will end up being a breaking change so it's something we need to keep in mind, warn users, and think of a properly rollout strategy.
It seems like the ideal solution would be:
- Change to
umask 0022
which is a sane default. - Allow user to change this inside of
.gitlab-ci.yml
to set theuid
andguid
correctly so that the user has the correct permissions to change something. (This is the part I'm still hazy about how to implement and why we need it) - Show a warning to the user when using
umask 0000
or the problems that this might cause. - Update documentation properly to educate users about this problem inside of the Runner and how to enable the correct fix.
Action points for me
- Create tests scenarios (like non-root image) where things change break for
umask 0022
and see how setting the correctuid/guid
- Read up on #3188 (comment 124544958)
- Create a merge request to document current behavior.
- Does this somehow relate to !2047 (merged)?
2020-06-24
- Spend time trying to get a reproducible environment/testing environment.
- Start writing down proposal
2020-07-01
2020-07-07
2020-07-07
I've pushed a small PoC in 3cc4e043 which fixes this issue by running chown -RP $UID:$GID
on the build directory if the container doesn't use the root
user.
What I'm thinking for now:
- Print log message when we change the permissions
- Behind feature flag
- We still mess with the permissions, are we OK with that?
- So we always have the feature flag that the user turned on when needed?
Original report from the community member
Overview
I'm assuming that this is not a bug, but maybe someone can tell me the rationale behind this decision.
We're currently re-building our continuous delivery setup on GitLab/Gitlab CI and noticed that file permissions are odd when deploying through Gitlab CI. When gitlab-ci-multi-runner
checks out a repository inside a docker container, it appears to be doing so with umask 0000
, with the result that everything that's checked out is world-writable. This is not a problem inside the container, but since we deploy from inside the container, the permissions are also transferred to the production system.
It seems like this behavior cannot be influenced, since before_script
runs after the repository has been cloned. To verify that my assumption about the umask was correct, I threw in a w.Command("umask", "0022")
in abstract.go L61, which resulted in the correct (for me) file permissions being set.
So my questions are:
- Is this intended behavior, and if so, what is the rationale?
- Can this be influenced in any way?
- If it is not, I'd be willing to submit a merge request providing such functionality after discussing what the best approach would be to do so
Thanks for your time :)