Create Enterprise guide for deploying and scaling a GitLab Runner Fleet
What does this MR do?
Creates a guide for helping organizations in planning for and configuring runners at scale.
Why was this MR needed?
Today we provide no or little guidance on how to deploy and scale a GitLab Runner fleet. When customers or technical account managers ask for guidance, we basically have to address each request individually. This is inefficient, and is resulting in a negative customer experience.
Questions to answer with this MR and future iterations:
-
Which executor option should I choose for my runner fleet? -
Which computing platform should I consider for hosting my fleet (VMs, Kubernetes)? -
What are the inputs that I should be aware of in making runner fleet configuration decisions? -
How do I plan the setup of the runner fleet to meet my organization's needs? -
How do I, and what do I need to monitor (metrics) once the fleet is setup (Day-0 configuration)? -
Do you GitLab, have any recommendations in terms of scaling the fleet to meet my organization's needs?
Monitoring Questions
-
How many jobs are in the different statuses at a given time (waiting to be picked up, running) -
Whats the error rate of the runner(s) requesting jobs from the GitLab instance (aka: is the runner talking effectively to the GitLab instance) -
If auto-scaling, how many resources are being used (VMs, pods, nodes etc) -
Where are the jobs coming from - which projects/namespaces -
If auto-scaling, are the runner managers over saturated? aka: simple metrics (CPU, memory usage, etc) of the runner manager(s)
What are the relevant issue numbers?
Edited by Suzanne Selhorn