Skip to content

Add a better handling of signal on both Helper and Build container for k8s executor in attach mode

What does this MR do?

dumb-init is available on all GitLab Runner Helper images for Linux like system. When creating the containers (Helper and Build), the previous container.Command ran cause /bin/sh (on build), /bin/bash (on helper) to be the PID 1 as shown below.

Mem: 4417224K used, 11973080K free, 56248K shrd, 121304K buff, 3351276K cached
CPU:   1% usr   1% sys   0% nic  97% idle   0% io   0% irq   0% sirq
Load average: 0.06 0.25 0.30 2/638 43
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
   30    25 root     S     1716   0%   0   0% /bin/sh /scripts-25452826-5437135578/step_script
   37     0 root     S     1688   0%   3   0% sh
   25    24 root     S     1648   0%   1   0% /bin/sh /scripts-25452826-5437135578/step_script
    1     0 root     S     1620   0%   1   0% /bin/sh
   24     1 root     S     1620   0%   0   0% sh -c (/scripts-25452826-5437135578/detect_shell_script /scripts-25452826-5437135578/step_script 2>&1 | tee -a /logs-25452826-5437135578/output.log) &
   43    37 root     R     1612   0%   0   0% top
   36    30 root     S     1608   0%   0   0% sleep 120
   26    24 root     S     1604   0%   0   0% tee -a /logs-25452826-5437135578/output.log

This prevents the termination signals sent to PID 1 to be propagated to the child process.

In this MR, the dumb-init is copied in the script dir for the attach mode and then used to run the bash shell script at the containers creation. This allows the dumb-init to be PID 1.

Mem: 4436728K used, 11953576K free, 56248K shrd, 121304K buff, 3352880K cached
CPU:   1% usr   1% sys   0% nic  96% idle   0% io   0% irq   0% sirq
Load average: 0.38 0.16 0.12 1/639 51
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
   31    26 root     S     1716   0%   2   0% /bin/sh /scripts-25452826-5437650637/step_script
   45     0 root     S     1688   0%   0   0% sh
   26    25 root     S     1648   0%   1   0% /bin/sh /scripts-25452826-5437650637/step_script
    7     1 root     S     1620   0%   2   0% /bin/sh
   25     1 root     S     1620   0%   0   0% sh -c (/scripts-25452826-5437650637/detect_shell_script /scripts-25452826-5437650637/step_script 2>&1 | tee -a /logs-25452826-5437650637/output.log) &
   51    45 root     R     1612   0%   2   0% top
   37    31 root     S     1608   0%   0   0% sleep 120
   27    25 root     S     1604   0%   2   0% tee -a /logs-25452826-5437650637/output.log
    1     0 root     S      220   0%   1   0% /scripts-25452826-5437650637/dumb-init -- sh -c if [ -x /usr/local/bin/bash ]; then  exec /usr/local/bin/bash  elif [ -x /usr/bin/bash ]; then  exec /usr/bin/bash  elif [ -x /bin/bash ]; then  exec /bin/bash  elif [ -x /usr/local/bin/sh ]; then  exec /usr/local/bin/sh  elif [ -x /usr/bin/sh ]; then  exec /usr/bin/sh  elif [ -x /bin/sh ]; then  exec /bin/sh  elif [ -x /busybox/sh ]; then  exec /bus ...

Few tests were made with alpine and ubuntu images and the job behaves as it should.

The feature is currently hidden behind the feature flag FF_USE_DUMB_INIT_WITH_KUBERNETES_EXECUTOR

Left to do:

  • More tests with a strict securityContext (Pod/container) ==> Need to generate GitLab Runner UBI Images for those tests
  • Exec Mode support (?)
  • Use of random images

Exec mode support will be added in a follow-up MR

Why was this MR needed?

Enable a better handling of the termination signal for the executorkubernetes

What's the best way to test this MR?

  • k8s integration tests passing locally (see this comment #36827 (comment 1619252629) for exceptions)
  • Any random job which used to pass should still pass

What are the relevant issue numbers?

fixes #28162 (closed) #29355 (closed) #36322 (closed)

Merge request reports

Loading