Kubernetes execute commands with attach instead of exec
Overview
This MR is the POC of moving from kubernetes+exec to kubernetes+attach for executing scripts inside containers.
Using exec to execute scripts has the main problem that it keeps a connection open until the script exits. If the connection is cut off for some reason the process is also killed. It also doesn't allow us to differentiate between a properly exited script and a bad connection as seen in #4119 (closed). We also use the open connection to read the output of the commands.
Using attach we can execute scripts though the main shell that is keeping the container alive. We can do that by leveraging the fact that writing commands to the stdin of the shell executes them and writes the output to the stdout of the main process. We can get that stdout by using kubectl logs
. You can try that by running the following command in your shell:
sh <<<'sh script.sh'
script.sh:
echo from stdin
In the example above you can imagine that the first sh
is the container's PID 1 process and we write the second sh
command to its stdin with kubectl attach
.
The catch here is we need to make sure that the first sh
process never dies because if it does the container will be killed as well. That's why every shell script has a trap which catches the exit code of the process and replaces it with an exit code of 0.
sh <<<'sh script.sh'
script.sh:
function tr {
echo "process exited with exit code $?"
exit 0
}
trap tr EXIT
echo from stdin
exit 1
This makes sure that the main process is never killed. It also allows us to catch a process' exit status in the logs. We use the logs to monitor when a process exits in order to start the next command. This line is parsed and used but not shown in the final logs. In the end the trap looks something like:
function tr {
command_exit_code=$?
out_json='{"command_exit_code": %s, "script": "%s"}\n'
printf "$out_json" "$command_exit_code" "$0"
exit 0
}
The scripts themselves are mounted through a configmap in each container. I tried using Here Documents
to pass the stdin, however kubernetes didn't allow me to pass quoted heredoc(simply froze the stdin of the process), which meant that passed scripts are expanded beforehand which could make them behave unexpectedly.
Testing/Development setup
-
Set up the following Vagrant file by running
vagrant up
vagrant file
# -*- mode: ruby -*- # vi: set ft=ruby : Vagrant.configure("2") do |config| # Every Vagrant development environment requires a box. You can search for # boxes at https://vagrantcloud.com/search. config.vm.box = "ubuntu/bionic64" # Create a private network, which allows host-only access to the machine # using a specific IP. config.vm.network "private_network", ip: "192.168.33.10" # Sync minikube certs config.vm.synced_folder "~/.minikube", "/home/vagrant/.minikube" config.vm.provision "shell", inline: <<-SHELL apt-get update apt-get install -y dsniff SHELL end
-
Inside of the VM, have the following
config.toml
config.toml
[[runners]] name = "kubernetes" url = "http://192.168.1.79:3000" # URL to GitLab instance for example gitlab.com token = "xxx" executor = "kubernetes" [runners.kubernetes] host = "https://192.168.99.219:8443" cert_file = "/home/vagrant/.minikube/client.crt" key_file = "/home/vagrant/.minikube/client.key" ca_file = "/home/vagrant/.minikube/ca.crt" bearer_token_overwrite_allowed = false bearer_token = "xx" image = "alpine:3.10" namespace = "" namespace_overwrite_allowed = "" privileged = true service_account_overwrite_allowed = "" pod_annotations_overwrite_allowed = "" helper_image = "gitlab/gitlab-runner-helper:x86_64-latest"
-
Compile Runner for linux
GOOS=linux make build_simple
-
Copy binary
out/binaries/gitlab-runner
to/home/vagrant/gitalb-runner
best option would be withscp -P 2222 -i $PATH_TO_VAGRANT_FILE/.vagrant/machines/default/virtualbox/private_key out/binaries/gitlab-runner vagrant@127.0.0.1:/home/vagrant/gitlab-runner
-
Run GitLab Runner
gitlab-runner run -c config.toml
-
Have the following
.gitlab-ci.yml
.gitlab-ci.yml
job: script: - 'for i in $(seq 1 60); do echo $(date); sleep 1; done' - echo "done"
-
When the job is running run
sudo tcpkill -i <your_ethernet_adapter> -9 port 8443
. You can find the ethernet adapter withip a
- With environment variable
FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY=true
You will be using the old way of execution withkube exec
which should show the problem in #4119 (closed) Screen_Shot_2020-02-05_at_15.19.58 - With environment variable
FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY=false
you should be able to still run the job and fail/success. Screen_Shot_2020-02-05_at_15.19.07
- With environment variable
Problems I noticed with this implementation
-
kubectl logs
only processes lines, which means that we only get lines back. If a user script does continious writing without a new line character we won't get that output back until it does, which also means that it won't be visible in the GitLab.com logs. An example such script is:for i in $(seq 1 30); do echo -n .; sleep 1; done;
. Currently I haven't found a workaround for this but also haven't had the chance to search that much. How much of an impact do you think this would have if we can't work around it? - If the runner crashes during a long running job, the job will continue running until it finishes its work. I feel this is a separate problem to solve that should go into a separate issue. There's also similar implications with some of the other excutors I think.
Related issue
Closes #4119 (closed) #6567 (closed)