Create a ruby script to populate the GDK with a representative sample of namespaces, projects, and runners

Problem

As evidenced by Slow page load times for Admin Area > Runners (#384066 - closed), backend and frontend developers are missing a work environment in GDK that is representative of the complex relationships that exist in real-world deployments (both self-managed users and .com). We normally create projects, namespaces, and runners in an ad-hoc manner with the simplest setup that allows us to test the functionality that we need to test (this also helps the reviewers by not imposing a big workload on them in terms of reproduction steps).

Requirements

The N+1 issues uncovered in the issue above have a common theme - they happen when we have:

lots of runners (3000+);
these runners don't belong to a single project, but actually to several projects, and these projects belong to different parent groups, and even different root namespaces;
the runners have hundreds of thousands of executed jobs (ci_builds records).

Proposal

MVP

A solution could be having a rake task as part of the GitLab repo. It would create the simplest representation of the different scenarios that we must support in our daily work in Category:Runner Fleet, optionally adding enough runners and jobs to strive to represent the load of a production system. This could be something like the following (where runner count and job count are configurable arguments):

graph TD
    G1[Top level group 1] --> G11
    G2[Top level group 2] --> G21
    G11[Group 1.1] --> G111
    G11[Group 1.1] --> G112
    G111[Group 1.1.1] --> P1111
    G112[Group 1.1.2] --> P1121
    G21[Group 2.1] --> P211

    P1111[Project 1.1.1.1<br><i>70% of jobs, sent to first 5 runners</i>]
    P1121[Project 1.1.2.1<br><i>15% of jobs, sent to first 5 runners</i>]
    P211[Project 2.1.1<br><i>15% of jobs, sent to first 5 runners</i>]

    IR1[Instance runner]
    P1111R1[Shared runner]
    P1111R[Project 1.1.1.1 runners<br>20% total runners]
    P1121R[Project 1.1.2.1 runners<br>49% total runners]
    G111R[Group 1.1.1 runners<br>30% total runners<br><i>remaining jobs</i>]
    G21R[Group 2.1 runners<br>1% total runners]

    P1111 --> P1111R1
    P1111 --> G111R
    P1111 --> IR1
    P1111 --> P1111R
    P1121 --> P1111R1
    P1121 --> IR1
    P1121 --> P1121R
    P211 --> P1111R1
    P211 --> G21R
    P211 --> IR1

    classDef groups fill:#09f6,color:#000000,stroke:#333,stroke-width:3px;
    classDef projects fill:#f96a,color:#000000,stroke:#333,stroke-width:2px;
    class G1,G2,G11,G111,G112,G21 groups
    class P1111,P1121,P211 projects

Sub-tasks

Create rake task
Option to set group/project prefix to avoid clash on subsequent runs, and allow creating more load;
Create group/project hierarchy
Create runners with random versions and ci_runner_versions records
Assign random tags to runners
Assign random executors to runners
Create fake jobs with actual durations
Create an instance runner as part of the seed
Create fake merge requests associated with pipelines

Future iterations

Option to nuke existing data and replace with up-to-date version;

Advantages

Working day-to-day with a database that more closely resembles that of a production system;
Having a standardized work environment that is common to all involved stakeholders (backend/frontend/UX/product). This allows them to more easily communicate about usage scenarios and be aware of performance issues earlier in the development cycle;
Any future improvements are now done in a SSOT location and can be quickly distributed to other team members;
Having this tool additionally represents a significant improvement in MR creation/review experience, since instead of going through the motions of explaining a reviewer how to create a couple of projects and then do a gitlab-runner register to create a shared project runner, we could just link the tool, have them do a simple run, and refer them to runner 3.1.1 🎉

Edited Dec 21, 2022 by Pedro Pombeiro