Skip to content

Fix panic when err is nil on retry for k8s executor

Romuald Atchadé requested to merge k8s-fix-panic-on-retry into main

What does this MR do?

Fix a panic with the executorkubernetes when the err returns after the script execution is nil.

When err is nil, GitLab Runner panics and exits with the following stacktrace

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x102b18c9c]

goroutine 167 [running]:
gitlab.com/gitlab-org/gitlab-runner/executors/kubernetes.(*executor).checkScriptExecution.func1({0x102b8b89c, 0x15})
	/Users/ratchade/projects/main-runner/executors/kubernetes/kubernetes.go:2533 +0xbc
slices.IndexFunc[...]({0x104795980, 0xd, 0xd}, 0x14000f5b948)
	/opt/homebrew/Cellar/go/1.22.4/libexec/src/slices/slices.go:107 +0x94
slices.ContainsFunc[...]({0x104795980, 0xd, 0xd}, 0x14000f5b948)
	/opt/homebrew/Cellar/go/1.22.4/libexec/src/slices/slices.go:122 +0x54
gitlab.com/gitlab-org/gitlab-runner/executors/kubernetes.(*executor).checkScriptExecution(0x1400140ea08, {0x102b79afd, 0xe}, {0x0, 0x0})
	/Users/ratchade/projects/main-runner/executors/kubernetes/kubernetes.go:2529 +0xb8
gitlab.com/gitlab-org/gitlab-runner/executors/kubernetes.(*executor).runInContainer.func1.1()
	/Users/ratchade/projects/main-runner/executors/kubernetes/kubernetes.go:2443 +0x210
gitlab.com/gitlab-org/gitlab-runner/helpers/retry.RunFunc.toValueFunc.func1()
	/Users/ratchade/projects/main-runner/helpers/retry/retry.go:44 +0x34
gitlab.com/gitlab-org/gitlab-runner/helpers/retry.retryRun[...](0x1400000e138, 0x14000c0ca30)
	/Users/ratchade/projects/main-runner/helpers/retry/retry.go:173 +0x64
gitlab.com/gitlab-org/gitlab-runner/helpers/retry.(*NoValueRetry).Run(0x14000cb6100)
	/Users/ratchade/projects/main-runner/helpers/retry/retry.go:185 +0x64
gitlab.com/gitlab-org/gitlab-runner/executors/kubernetes.(*executor).runInContainer.func1()
	/Users/ratchade/projects/main-runner/executors/kubernetes/kubernetes.go:2446 +0x31c
created by gitlab.com/gitlab-org/gitlab-runner/executors/kubernetes.(*executor).runInContainer in goroutine 8
	/Users/ratchade/projects/main-runner/executors/kubernetes/kubernetes.go:2424 +0x1f4

Why was this MR needed?

Prevent GitLab Runner from panic-ing

What's the best way to test this MR?

gitlab-ci
variables:
  FF_USE_POWERSHELL_PATH_RESOLVER: "true"
  FF_RETRIEVE_POD_WARNING_EVENTS: "true"
  FF_PRINT_POD_EVENTS: "true"
  FF_SCRIPT_SECTIONS: "true"
  CI_DEBUG_SERVICES: "true"
  GIT_DEPTH: 5
  MY_TEST_VARIABLE_1: gitlab-ci
  MY_TEST_VARIABLE_2: gitlab-ci

simple-job:
  script:
    - echo $MY_TEST_VARIABLE_1
    - echo $MY_TEST_VARIABLE_2
  after_script:
    - echo "this is the after_script running"
config.toml
concurrent = 1
check_interval = 1
log_level = "debug"
shutdown_timeout = 0

listen_address = ':9252'

[session_server]
  session_timeout = 1800

[[runners]]
  name = "investigation"
  url = "https://gitlab.com/"
  id = 0
  token = "glrt_REDACTED"
  token_obtained_at = "0001-01-01T00:00:00Z"
  token_expires_at = "0001-01-01T00:00:00Z"
  executor = "kubernetes"
  shell = "bash"
  limit = 1
  [runners.kubernetes]
    host = ""
    bearer_token_overwrite_allowed = false
    image = "alpine"
    pod_termination_grace_period_seconds = 0
    namespace = ""
    namespace_overwrite_allowed = ""
    pod_labels_overwrite_allowed = ""
    service_account_overwrite_allowed = ""
    pod_annotations_overwrite_allowed = ""
    node_selector_overwrite_allowed = ".*"
    allow_privilege_escalation = false
    [[runners.kubernetes.services]]
    [runners.kubernetes.dns_config]
    [runners.kubernetes.pod_labels]
      user = "ratchade"

With the main branch, this job will hang as GitLab Runner exit with error.

With the MR branch, the job succeeds as seen in the job log

What are the relevant issue numbers?

Merge request reports

Loading