Run cleanup when prepare stage fails
What does this MR do?
When the creation of a virtual machine fails, attempt to delete the VM.
Why was this MR needed?
When the autoscaler fails to connect to a newly created virtual machine, it currently does not delete the machine, which leaves orphaned machines in the cluster. Additionally, since the machine still exists, the automated retry fails because you can't have two VMs named identically.
See https://gitlab.com/gitlab-org/ci-cd/shared-runners/images/macstadium/orka/-/jobs/1093844747
What's the best way to test this MR?
-
Connect to the Orka VPN using the script in the orka repository:
<path to orka>/scripts/vpn.sh
-
Execute:
eval "$(op signin gitlab.1password.com)"
export ORKA_API_ENDPOINT="$(op get item --vault Verify --account gitlab --fields "API URL" "Orka VPN")"
export ORKA_API_TOKEN=$(jq -r ".token" ~/.config/configstore/orka-cli.json)
cd scripts/orka_integration
gsed -i "s|ORKA_API_ENDPOINT|$ORKA_API_ENDPOINT|" config.toml
gsed -i "s|ORKA_API_TOKEN|$ORKA_API_TOKEN|" config.toml
# Put wrong password in configuration to trigger the failure
gsed -i 's|Password = "gitlab"|Password = "wrong"|' config.toml
export CUSTOM_ENV_CI_JOB_ID="$(date +%s)"
export CUSTOM_ENV_CI_JOB_IMAGE="macos-11-xcode-12.img"
export BUILD_FAILURE_EXIT_CODE=1
export SYSTEM_FAILURE_EXIT_CODE=2
go run ../../cmd/autoscaler/main.go custom prepare
- The VM should get deleted, the original error should appear, and the exit code should be non zero
What are the relevant issue numbers?
Closes #72 (closed)
Edited by Adrien Kohlbecker