Georgi N. Georgiev requested to merge k8s-attach-poc into master Jan 20, 2020

Overview

This MR is the POC of moving from kubernetes+exec to kubernetes+attach for executing scripts inside containers.

Using exec to execute scripts has the main problem that it keeps a connection open until the script exits. If the connection is cut off for some reason the process is also killed. It also doesn't allow us to differentiate between a properly exited script and a bad connection as seen in #4119 (closed). We also use the open connection to read the output of the commands.

Using attach we can execute scripts though the main shell that is keeping the container alive. We can do that by leveraging the fact that writing commands to the stdin of the shell executes them and writes the output to the stdout of the main process. We can get that stdout by using kubectl logs. You can try that by running the following command in your shell:

sh <<<'sh script.sh'

script.sh:

echo from stdin

In the example above you can imagine that the first sh is the container's PID 1 process and we write the second sh command to its stdin with kubectl attach.

The catch here is we need to make sure that the first sh process never dies because if it does the container will be killed as well. That's why every shell script has a trap which catches the exit code of the process and replaces it with an exit code of 0.

sh <<<'sh script.sh'

script.sh:

function tr {
    echo "process exited with exit code $?"
    exit 0
}

trap tr EXIT

echo from stdin
exit 1

This makes sure that the main process is never killed. It also allows us to catch a process' exit status in the logs. We use the logs to monitor when a process exits in order to start the next command. This line is parsed and used but not shown in the final logs. In the end the trap looks something like:

function tr {
    command_exit_code=$?
    out_json='{"command_exit_code": %s, "script": "%s"}\n'
    printf "$out_json" "$command_exit_code" "$0"

    exit 0
}

The scripts themselves are mounted through a configmap in each container. I tried using Here Documents to pass the stdin, however kubernetes didn't allow me to pass quoted heredoc(simply froze the stdin of the process), which meant that passed scripts are expanded beforehand which could make them behave unexpectedly.

Testing/Development setup

Set up the following Vagrant file by running vagrant up

vagrant file

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|
  # Every Vagrant development environment requires a box. You can search for
  # boxes at https://vagrantcloud.com/search.
  config.vm.box = "ubuntu/bionic64"

  # Create a private network, which allows host-only access to the machine
  # using a specific IP.
  config.vm.network "private_network", ip: "192.168.33.10"

  # Sync minikube certs
  config.vm.synced_folder "~/.minikube", "/home/vagrant/.minikube"

  config.vm.provision "shell", inline: <<-SHELL
    apt-get update
    apt-get install -y dsniff
  SHELL
end

Inside of the VM, have the following config.toml

config.toml

[[runners]]
  name = "kubernetes"
  url = "http://192.168.1.79:3000" # URL to GitLab instance for example gitlab.com
  token = "xxx"
  executor = "kubernetes"
  [runners.kubernetes]
    host = "https://192.168.99.219:8443"
    cert_file = "/home/vagrant/.minikube/client.crt"
    key_file = "/home/vagrant/.minikube/client.key"
    ca_file = "/home/vagrant/.minikube/ca.crt"
    bearer_token_overwrite_allowed = false
    bearer_token = "xx"
    image = "alpine:3.10"
    namespace = ""
    namespace_overwrite_allowed = ""
    privileged = true
    service_account_overwrite_allowed = ""
    pod_annotations_overwrite_allowed = ""
    helper_image = "gitlab/gitlab-runner-helper:x86_64-latest"

Compile Runner for linux GOOS=linux make build_simple
Copy binary out/binaries/gitlab-runner to /home/vagrant/gitalb-runner best option would be with scp -P 2222 -i $PATH_TO_VAGRANT_FILE/.vagrant/machines/default/virtualbox/private_key out/binaries/gitlab-runner vagrant@127.0.0.1:/home/vagrant/gitlab-runner
Run GitLab Runner gitlab-runner run -c config.toml

Have the following .gitlab-ci.yml

.gitlab-ci.yml

job:
  script:
  - 'for i in $(seq 1 60); do echo $(date); sleep 1; done'
  - echo "done"

When the job is running run sudo tcpkill -i <your_ethernet_adapter> -9 port 8443. You can find the ethernet adapter with ip a
- With environment variable FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY=true You will be using the old way of execution with kube exec which should show the problem in #4119 (closed) Screen_Shot_2020-02-05_at_15.19.58
- With environment variable FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY=false you should be able to still run the job and fail/success. Screen_Shot_2020-02-05_at_15.19.07

Problems I noticed with this implementation

kubectl logs only processes lines, which means that we only get lines back. If a user script does continious writing without a new line character we won't get that output back until it does, which also means that it won't be visible in the GitLab.com logs. An example such script is: for i in $(seq 1 30); do echo -n .; sleep 1; done;. Currently I haven't found a workaround for this but also haven't had the chance to search that much. How much of an impact do you think this would have if we can't work around it?
If the runner crashes during a long running job, the job will continue running until it finishes its work. I feel this is a separate problem to solve that should go into a separate issue. There's also similar implications with some of the other excutors I think.

Related issue

Closes #4119 (closed) #6567 (closed)

Edited Feb 05, 2020 by Steve Xuereb - Out of Office Back 2025-01-03

Kubernetes execute commands with attach instead of exec

Overview

Testing/Development setup

Problems I noticed with this implementation

Related issue

Merge request reports