The source project of this merge request has been removed.
WIP: use k8s jobs instead of pod+exec
What does this MR do?
WIP MR to show what we did to solve the problem mentioned in issue: #3814 (closed)
I have set the target branch to the point where we originally branched off.
I have altered the CI yaml to work with our infrastructure, and have broken other functionality (non kubernetes executors) to get this working for our use case. Unfortunately, I do not have the bandwidth to spend much time on this, so I'm hoping this can be a useful platform for discussion.
High-level changes:
- use of K8s jobs instead of direct pod creation
- helper and build commands are placed into a configmap and deployed/cleaned up along with the jobs
- job mounts the configmap as a volume and runs the scripts
- added GPU support
Benefits:
- Huge savings on cloud costs (we use GKE)
- we can enable autoscaling and preemtible node types
- builds are tolerant to the node being shut down at any time. They will simply restart. (thanks to k8s jobs!)
- Higher pipeline reliability
- we do not see broken exec pipes on our long running (>60min) builds! issue: #3814 (closed)
Why was this MR needed?
Refactoring to use k8s Job objects instead of the existing pod + exec strategy made our pipelines much more stable overall.
What are the relevant issue numbers?
Edited by Chet Lemon