Use default and max workspace resources on workspace reconcile (!139209) · Merge requests · GitLab.org / GitLab

Vishal Tak requested to merge vtak/resources_reoncile into master Dec 08, 2023

What does this MR do and why?

Issue: Backend: Add logic for using the agent's defaul... (#427144 - closed)

Use default and max workspace resources on workspace reconcile.

With Workspace config_version 2 migration (!131402 - merged) , we no longer needed desired_config_generator_prev1 and devfile_parser_prev1 since all non-terminated workspaces have been migrated to config version 2. However, for this change, we need to introduce a new config version. So instead of removing the files in one MR and then reintroducing them in this MR, I've just updated the files directly in this MR.

This MR is broken down into 4 commits for easier reviewing.

Update previous versions of workspace resources generation
- Update desired_config_generator_prev1 with contents of desired_config_generator
- Update devfile_parser_prev1 with contents of devfile_parser
- Update remote development shared contexts
Apply container resoruce default and create resource quota
- Generate resource quota using the agent's max_resources_per_workspace.
- Add annotation for SHA256 of max_resources_per_workspace to force workspace restart when value changes.
- Deep merge the default_resources_per_workspace_container into the containers and init containers of the workspace to apply the defaults.
Update version of newly created workspaces
Update naming convention for config versions ( Related - Revisit versioning of #create_config_to_apply i... (#425227 - closed) )
- Rename all _prev1 files to _v2 to make the workspace config version explicit in these files

This MR will be followed by migration in Rails: Migrate workspaces with config_version=2... (#434494 - closed) to migrate non-terminated workspaces from config version 2 to 3.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Setup

Set the remote development agent config where the max_resources_per_workspace and default_resources_per_workspace_container is not set.

remote_development:
  enabled: true
  dns_zone: workspaces.localdev.me
  network_policy:
    enabled: true
    egress:
      - allow: 0.0.0.0/0
        except:
          - 10.0.0.0/8
          - 172.16.0.0/12
          - 192.168.0.0/16
      - allow: 172.16.123.1/32

Create a devfile in a project

schemaVersion: 2.2.0
components:
- name: gitlab-ui
  attributes:
    gl/inject-editor: true
  container:
    image: registry.gitlab.com/gitlab-org/remote-development/gitlab-remote-development-docs/debian-bullseye-ruby-3.2.patched-golang-1.20-rust-1.65-node-18.16-postgresql-15@sha256:216b9bf0555349f4225cd16ea37d7a627f2dad24b7e85aa68f4d364319832754
    env:
    - name: STORYBOOK_HOST
      value: "0.0.0.0"
    endpoints:
    - name: storybook
      targetPort: 9001
      secure: true
      protocol: http
    memoryLimit: "2048Mi"
    cpuLimit: "2.3"

Create a new workspace for this project. Open this workspace and create a new file in it called TEST.md and type something. This would be later used for some validations.

Verifying `default_resources_per_workspace_container` behaviour

Set the default_resources_per_workspace_container in the remote development agent config

remote_development:
  enabled: true
  dns_zone: workspaces.localdev.me
  network_policy:
    enabled: true
    egress:
    - allow: '0.0.0.0/0'
      except:
      - '10.0.0.0/8'
      - '172.16.0.0/12'
      - '192.168.0.0/16'
    - allow: '172.16.123.1/32'
  default_resources_per_workspace_container:
    limits:
      cpu: "1.5"
      memory: "786Mi"
    requests:
      cpu: "0.6"
      memory: "512Mi"

This will result in the pod for the existing workspace being terminated and a new pod is being created because the default_resources_per_workspace_container.resources.requests are used/merged to generate the workspace's config during reconciliation. You can verify this by performing kubectl describe po and checking the container's resources.request and verifying that it matches the agent's default_resources_per_workspace_container.resources.requests.
Once the workspace is ready, open the workspace and verify that it contains the TEST.md file.
Thus, any change in the agent's default_resources_per_workspace_container results in all workspaces being immediately restarted and the value being enforced without losing any data in the workspace.

Verifying `max_resources_per_workspace` behaviour

Set the max_resources_per_workspace in the remote development agent config

remote_development:
  enabled: true
  dns_zone: workspaces.localdev.me
  network_policy:
    enabled: true
    egress:
    - allow: '0.0.0.0/0'
      except:
      - '10.0.0.0/8'
      - '172.16.0.0/12'
      - '192.168.0.0/16'
    - allow: '172.16.123.1/32'
  default_resources_per_workspace_container:
    limits:
      cpu: "1.5"
      memory: "786Mi"
    requests:
      cpu: "0.6"
      memory: "512Mi"
  max_resources_per_workspace:
    limits:
      cpu: "5"
      memory: "5Gi"
    requests:
      cpu: "3"
      memory: "3Gi"

This will result in the pod for the existing workspace being terminated and a new pod is being created because the max_resources_per_workspace has changed and it is used to generate the Kubernetes Resource Quota during reconciliation. The reason it restarts is because we have added an annotation on the workspace pod which is a SHA256 of the agent's max_resources_per_workspace value. You can verify this by performing kubectl describe po and checking the pod's annotations. You can check the resource quota generated by running kubectl describe resourcequota.
Once the workspace is ready, open the workspace and verify that it contains the TEST.md file.
Thus, any change in the agent's max_resources_per_workspace results in all workspaces being immediately restarted and the value being enforced without losing any data in the workspace.
Update the agent's max_resources_per_workspace.limits.cpu to 2 and max_resources_per_workspace.requests.cpu to 1.8.
This will result in the pod for the existing workspace being terminated but no new pod being created. This is because the workspace's devfile have the cpuLimit set to 2.3 and the agent's max_resources_per_workspace.limits.cpu is set to 2. Thus, it is violating the constraints. You can further verify this by doing kubectl get rs -o yaml and checking the status of the latest replica set which will have a message similar to - message: 'pods "workspace-10-1-1mljkt-7bbcb9698-qp95n" is forbidden: exceeded quota. Eventually(after 10 minutes), the workspace will have an actual state of Failed.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited Dec 21, 2023 by Vishal Tak

Use default and max workspace resources on workspace reconcile

What does this MR do and why?

How to set up and validate locally

Setup

Verifying default_resources_per_workspace_container behaviour

Verifying max_resources_per_workspace behaviour

MR acceptance checklist

Merge request reports

Verifying `default_resources_per_workspace_container` behaviour

Verifying `max_resources_per_workspace` behaviour