feat: Add job name to kubernetes pod labels
-
Please check this box if this contribution uses AI-generated content (including content generated by GitLab Duo features) as outlined in the GitLab DCO & CLA
What does this MR do?
This adds the job name to pod labels.
Why was this MR needed?
While the job name already exists on pod annotations, these annotations cannot be used to filter metrics in GCP. This is important to help rightsize job resources.
Pod labels can be used as a filter in any GCP k8s_container metric (cpu/memory requests/limit utilization) via "user metadata labels".
I suspect that having this label could help for other observability platforms, though I haven't research others.
Some common off-the-shelf tools do not help in this scenario:
- https://github.com/kubernetes-sigs/metrics-server is not able to add annotations as prometheus labels
- https://github.com/kubernetes/kube-state-metrics does not have CPU / memory metrics
Work-around 1 (toil):
- Add pod_labels_overwrite_allowed
- Manually add job name labels to every job
Workaround 2 (incomplete):
- It is possible to use GKE audit logs to create a log-based metric that has both the job name (from the annotation) and pod name as labels
- This metric can be joined with promQL on k8s_container metrics, and filtered with job name
This workaround is incomplete because gitlab CI often creates pods large enough that they are not included in the GKE audit logs, and instead audit.k8s.io/truncated
, which results in missing job metrics for the most important (large) jobs. I believe these pod objects are so large because gitlab CI inject CI/CD vars directly into the pod spec instead of referencing them from a kubernetes secret.
What's the best way to test this MR?
- Automated tests
- Creating a job with the kubernetes executor and verifying that the label exists