Improve review apps
What does this MR do?
Done:
-
Enable autoscaling for the review-apps-ee
cluster, with maximum 30 preemptible nodes -
Merge the review-app-image
job into thereview
job -
Made the review
job automatic -
Soft cleanup review apps before setting up a review app - Fetch deployments via API (https://docs.gitlab.com/ee/api/deployments.html), and for each deployment, if its environment starts with
review/
:- If last deployed > 6 days, delete the environment (https://docs.gitlab.com/ee/api/environments.html#delete-an-environment)
- Deleting an environment only delete its database record, so if the review app is still running in GCP, GitLab won't know (that's we also do a hard cleanup using Helm below)
- If last deployed > 5 days, stop the environment (https://docs.gitlab.com/ee/api/environments.html#stop-an-environment)
- Stopping an environment first ensures that the
stop
CI action is called, thus gracefully stopping the review app
- Stopping an environment first ensures that the
- If last deployed > 6 days, delete the environment (https://docs.gitlab.com/ee/api/environments.html#delete-an-environment)
- Fetch deployments via API (https://docs.gitlab.com/ee/api/deployments.html), and for each deployment, if its environment starts with
-
Hard cleanup stale (last updated > 7 days) Helm releases - This ensures that if review app stopping failed at the step above, we still clear the actual Helm release (and the environment will also be cleared in GitLab thanks to "delete the environment" step above
To do:
-
Make sure review app deployment works, currently blocked by Error: error when upgrading: current Tiller version is newer, use --force-upgrade to downgrade
-
Make sure review app deployment works, currently blocked by Error: release review-improve-re-ffvbep failed: timed out waiting for the condition
-
That may be caused by the self-signed TLS certificate, which prevents the runner from connecting properly: E [31;1mERROR: Registering runner... failed [0;m [31;1mrunner[0;m=rE3Vn6af [31;1mstatus[0;m=couldn't execute POST against https://gitlab-review-improve-re-ffvbep.gitlab-review.app/api/v4/runners: Post https://gitlab-review-improve-re-ffvbep.gitlab-review.app/api/v4/runners: x509: certificate is valid for ingress.local, not gitlab-review-improve-re-ffvbep.gitlab-review.app E [31;1mPANIC: Failed to register this runner. Perhaps you are having network problems[0;m
I've created https://gitlab.com/gitlab-com/infrastructure/issues/4735 to ask for a proper certificate.
-
-
Move the review apps cleanup to a scheduled pipeline (twice per day should be enough) instead of before setting up a review app -
Disable the review
job for forks
Are there points in the code the reviewer needs to double check?
- What do you think about stopping GitLab
review/*
environments after 5 days? - What do you think about deleting GitLab
review/*
environments after 6 days? - What do you think about cleaning up Helm releases after 7 days?
Does this MR meet the acceptance criteria?
-
Documentation created/updated -
Tests added for this feature/bug - Conforms to the code review guidelines
-
Has been reviewed by a Backend maintainer
-
-
EE specific content should be in the top level /ee
folder -
Conforms to the merge request performance guidelines -
Conforms to the style guides -
Conforms to the database guides -
If you have multiple commits, please combine them into a few logically organized commits by squashing them -
End-to-end tests pass ( package-and-qa
manual pipeline job)
What are the relevant issue numbers?
Edited by Marin Jankovski