Use executor's context to enforce timeouts on VirtualBox commands
What does this MR do?
Pass the VirtualBox executor's context.Context
to all commands performed through the VirtualBox helper.
Why was this MR needed?
The following description assumes that the runner is configured with concurrent = 1
.
In some cases, the prepare
stage of the VirtualBox executor takes forever when restoring a VM snapshot.
When this happens, the job is marked as failed after the configured timeout.
In order to get the GitLab Runner instance to process another job (i.e. to continue requesting jobs from the GitLab server) manual intervention is required (e.g. reboot the VM host system, or restart GitLab Runner service).
With the changes of this MR, the command which was started to restore the snapshot will be cancel when the job times out. The runner can then continue requesting jobs from the GitLab server and executing them.
What's the best way to test this MR?
Note: this assumes testing on a Linux/unix-like machine.
- Create a script named
vboxmanage
, with executable-bit set (i.e.chmod +x
) and the following content:
#!/bin/bash /bin/shell 3600
- Configure a runner with
virtualbox
executor andconcurrent=1
- When starting the runner, prepend the directory containing the
vboxmanage
script to the path, i.e.PATH=/home/myusername/runner-test:$PATH ./gitlab-runner
- Start two jobs with a short timeout (e.g.
10m
) - If the MR works, both jobs should start and then time out after 10 minutes each.
What are the relevant issue numbers?
#26583 is vaguely related: it describes a situation where the runner cancels the prepare
stage properly after the job timeout when pulling a Docker image doesn't finish in time. The same behaviour should be expected for the VirtualBox executor when restoring a snapshot. The current MR does not address the proposal of #26583.