Skip to content

Switch to opinionated data and a "master" data generation script for performance tests

Nailia Iskhakova (OOO) requested to merge opinionated-test-data into master

Epic gitlab-org&3356 (closed), relates to #201 (closed)

Changes:

  • Added generate-gpt-data that generates horizontal and vertical data by default:
generate-gpt-data
GPT Data Generator v1.0.0 - opinionated test data for the GitLab Performance Tool

Usage: generate-gpt-data [options]

Generates opinionated test data for the GitLab Performance Tool. Data generated can be 'horizontal' (many groups and projects) and  or 'vertical' (large project
imports).

Options:
  -e, --environment=<s>                  Name of optional Environment Config file in environments directory that will be used for test data generation.
                                         Alternative filepath can also be given.
  --environment-url=<s>                  Full URL for the environment to import to.
  --root-group=<s>                       Root group for GPT data. (Default: gpt)
  --horizontal, --no-horizontal          Generate horizontal GPT data with multiple subgroups and projects. (Default: true)
  --group=<s>                            Group name that the subgroups and projects will be generated under. (Default: many_groups_and_projects)
  --subgroup-prefix=<s>                  Prefix that the subgroups will be generated with. (Default: gpt-subgroup-)
  -s, --subgroups=<i>                    Number of subgroups to create
  --project-prefix=<s>                   Prefix that the projects will be generated with. (Default: gpt-project-)
  -p, --projects=<i>                     Number of projects to create in each subgroup
  --vertical, --no-vertical              Generate vertical GPT data with large projects (default: true)
  --vert-group=<s>                       Group name that the vertical data will be generated to. (Default: large_projects)
  --project-tarball=<s>                  Location of project tarball to import. Can be local or remote. (Default:
                                         https://gitlab.com/gitlab-org/quality/performance-data/raw/master/projects_export/gitlabhq_export.tar.gz)
  --gitaly-nodes=<s+>                    Repository storages that will be used to import vertical data.
  -f, --force                            Force the data generation ignoring the existing data
  -u, --unattended                       Skip the data injection warning
  -m, --max-wait-for-delete-group=<i>    Maximum wait time(seconds) for groups to be deleted (default: 300)
  -h, --help                             Show help message

Environment Variables:
  ACCESS_TOKEN             A valid GitLab Personal Access Token for the specified environment. The token should have admin access for the ability to create and
import projects. (Default: nil)

Examples:
  Generate horizontal and vertical data using 10k.json environment file:
    bin/generate-gpt-data --environment 10k.json
  Generate only horizontal using 10k.json environment file:
    bin/generate-gpt-data --environment 10k.json --horizontal --no-vertical
  Generate only vertical data using 10k.json environment file:
    bin/generate-gpt-data --environment 10k.json --no-horizontal --vertical
  Generate only horizontal data with 10 subgroups and 100 projects in each:
    bin/generate-gpt-data --environment_url 10k.testbed.gitlab.net --subgroups 10 --projects 100 --no-vertical
  Generate only vertical data using custom project tarball path:
    bin/generate-gpt-data --environment 10k.json --no-horizontal --vertical --project-tarball=/home/user/test-project.tar.gz
  • Refactored import-project to use it in "master" script
  • Updated tests to use new structure
  • Added GPT Logger to write output to the file and console at the same time

New GPT data structure

Group: gpt

  • Group: large_projects
    • A gitlabhq import for each Gitaly node (aka vertical data)
  • Group: many_groups_and_projects
    • A number of subgroups (gpt-subgroup-x) each with a number of projects(gpt-project-x) each (aka horizontal data)

New environment file structure:

{
  "environment": {
    "name": "localhost",
    "url": "http://localhost",
    "user": "root",
    "config": {
      "latency": "0"
    },
    "storage_nodes": ["default"]
  },
  "gpt_data": {
    "large_project": "gitlabhq",
    "many_groups_and_projects": {
      "root_group": "gpt",
      "group": "many_groups_and_projects",
      "subgroups": 5,
      "subgroup_prefix": "gpt-subgroup-",
      "projects": 5,
      "project_prefix": "gpt-project-"
    }
  }
}

Added:

  • "storage_nodes": ["default"] - for vertical data to import gitlabhq for each gitaly_nodes
  • "large_project" for vertical data
  • "many_groups_and_projects" - for horizontal data
    • "group" - name that will be used for group with horizontal data
    • "subgroups": 10, "projects": 10 generate 10 subgroups with 10 projects.
    • "subgroup_prefix" and "project_prefix" - prefixes that will be used to create group/projects

generate-gpt-data

  • Since the script will inject data to the environment we show a warning for users to confirm that they want to run this script similarly to what we have in performance-data project.
  • Script has a version, it is added to groups' description. It should help us to keep track of what version was used to generate GPT data in case we will make some changes to it.
  • Script can use environment file or options

Horizontal data

Creates X subgroups with X projects in each.

  • If the required amount of subgroups and projects in each of them exists - script doesn't do anything
  • If there there are more groups or projects than required, old data will be removed and new created. For example, previously we had 10 subgroups with 10 projects in each. Now we want 5 subgroups with 5 projects. The script will delete many_groups_and_projects with old subgroups/projects, wait until it will be deleted, create a new many_groups_and_projects and add horizontal data.
Example output.
GPT data v1.0.0 - opinionated test data for the GPT
Checking that GitLab environment 'http://localhost' is available and that provided Access Token works...
Environment and Access Token check was successful - URL: http://localhost, Version: 13.0.0-pre 6affa9fa4f7

Group gpt already exists
Group gpt/many_groups_and_projects already exists
Delete old group gpt/many_groups_and_projects
Waiting for group gpt/many_groups_and_projects to be deleted........................
Creating group gpt/many_groups_and_projects
Creating group gpt/many_groups_and_projects/gpt-subgroup-1
Creating group gpt/many_groups_and_projects/gpt-subgroup-2
Creating group gpt/many_groups_and_projects/gpt-subgroup-3
Creating group gpt/many_groups_and_projects/gpt-subgroup-4
Creating group gpt/many_groups_and_projects/gpt-subgroup-5
Creating project gpt/many_groups_and_projects/gpt-subgroup-1/gpt-project-1
Creating project gpt/many_groups_and_projects/gpt-subgroup-1/gpt-project-2
Creating project gpt/many_groups_and_projects/gpt-subgroup-1/gpt-project-3
Creating project gpt/many_groups_and_projects/gpt-subgroup-1/gpt-project-4
Creating project gpt/many_groups_and_projects/gpt-subgroup-1/gpt-project-5
Creating project gpt/many_groups_and_projects/gpt-subgroup-2/gpt-project-6
Creating project gpt/many_groups_and_projects/gpt-subgroup-2/gpt-project-7
Creating project gpt/many_groups_and_projects/gpt-subgroup-2/gpt-project-8
Creating project gpt/many_groups_and_projects/gpt-subgroup-2/gpt-project-9
Creating project gpt/many_groups_and_projects/gpt-subgroup-2/gpt-project-10
Creating project gpt/many_groups_and_projects/gpt-subgroup-3/gpt-project-11
Creating project gpt/many_groups_and_projects/gpt-subgroup-3/gpt-project-12
Creating project gpt/many_groups_and_projects/gpt-subgroup-3/gpt-project-13
Creating project gpt/many_groups_and_projects/gpt-subgroup-3/gpt-project-14
Creating project gpt/many_groups_and_projects/gpt-subgroup-3/gpt-project-15
Creating project gpt/many_groups_and_projects/gpt-subgroup-4/gpt-project-16
Creating project gpt/many_groups_and_projects/gpt-subgroup-4/gpt-project-17
Creating project gpt/many_groups_and_projects/gpt-subgroup-4/gpt-project-18
Creating project gpt/many_groups_and_projects/gpt-subgroup-4/gpt-project-19
Creating project gpt/many_groups_and_projects/gpt-subgroup-4/gpt-project-20
Creating project gpt/many_groups_and_projects/gpt-subgroup-5/gpt-project-21
Creating project gpt/many_groups_and_projects/gpt-subgroup-5/gpt-project-22
Creating project gpt/many_groups_and_projects/gpt-subgroup-5/gpt-project-23
Creating project gpt/many_groups_and_projects/gpt-subgroup-5/gpt-project-24
Creating project gpt/many_groups_and_projects/gpt-subgroup-5/gpt-project-25

<-> Horizontal data: successfully generated!
█ GPT data generation finished after 29 seconds.

Example output:

Vertical data

Vertical data generation:

  • Imports project for each "storage_nodes" using existing import-project module
  • Idempotent: import only if either the project doesn't exist or if its version number doesn't match config in the project's description
Example output
GPT data v1.0.0 - opinionated test data for the GPT
Checking that GitLab environment 'http://localhost' is available and that provided Access Token works...
Environment and Access Token check was successful - URL: http://localhost, Version: 13.0.0-pre 6affa9fa4f7

Group gpt already exists

| Vertical data: importing large projects for GPT...
Group gpt/large_projects already exists
Delete old group gpt/large_projects
Waiting for group gpt/large_projects to be deleted.....Creating group gpt/large_projects
Starting import of Project 'gitlabhq0' from tarball '/Users/test/project_exports/toolbox_export.tar.gz' under namespace 'gpt/large_projects' to GitLab environment 'http://localhost'


Checking that GitLab environment 'http://localhost' is available and that provided Access Token works...
Environment and Access Token check was successful - URL: http://localhost, Version: 13.0.0-pre 6affa9fa4f7

Importing project gitlabhq0...
{"id"=>304,
 "description"=>nil,
 "name"=>"gitlabhq0",
 "name_with_namespace"=>"gpt / large_projects / gitlabhq0",
 "path"=>"gitlabhq0",
 "path_with_namespace"=>"gpt/large_projects/gitlabhq0",
 "created_at"=>"2020-05-04T13:43:50.948Z",
 "import_status"=>"scheduled",
 "correlation_id"=>"bZZzCzwIgGa",
 "failed_relations"=>[]}

Project tarball has successfully uploaded and started to be imported with ID '304'
Waiting until Project '304' has imported successfully......
Project has successfully imported in 16 seconds:
http://localhost/gpt/large_projects/gitlabhq0


| Vertical data: successfully generated!
█ GPT data generation finished after 18 seconds.

Questions/complications/todos:

  • soft-delete should be disabled to delete subgroups/project without delay
    • Solution: we need to explicitly call out that we will update env setting before the data generation and restore it afterwards.
  • We use new groups in several tests, what should we do if for some reason user doesn't have many_groups_and_projects group and any horizontal data?
    • Solution: With this move we're switching to opinionated data. Tests should be moved to use these projects and they should be present in the environment before running a test. If they don't exist the tests should be skipped with reason given.
  • Probably will be helpful to add script logs to the file in case we need to debug any issues with generation from users
    • Solution: GPT Logger was added
  • Another possible future application for generate-gpt-data version would be to check GPT data version before test run and throw a warning that for example you're running GPT with the old data.
    • Solution: We need to aim for separation between the two to keep things simple.
  • TODO:
  • Check script with docker image
    • Solution: We need to create a new docker image for this script -> #241 (closed)
  • Add docs
  • Review wording
  • Update all env files
  • Switch to opinionated test data on all env -> #246 (closed)
Edited by Nailia Iskhakova (OOO)

Merge request reports

Loading