RepositoryService::Fsck Acceptance Testing
~Conversation: #769 (closed)
See the Migration Process documentation for more information on the Acceptance Testing stage of the process.
Details
-
Feature Toggle Name:
gitaly_git_fsck
-
GRPC Service:
RepositoryService::Fsck
-
Required Gitaly Version:
v0.58.0
-
Required GitLab Version:
v10.3
1. Preparation
-
Routes: what routes use this migration? - No clue... Pushing a lot?
2. Development Trial
Check Dev Server Versions
-
Gitaly: Gitaly Dev Version Tracker Dashboard -
GitLab: https://dev.gitlab.org/help
dev.gitlab.org
:
Enable on -
!feature-set gitaly_git_fsck true
in#dev-gitlab
Then leave running while monitoring and performing some testing through web, api or SSH.
Monitor (initially )
-
Monitor Grafana feature dashboard on dev: Gitaly Feature Status Dashboard -
Inspect logs in ELK: - Fsck invocations, last hour for unusual activity
- Fsck errors, last hour for unusual activity
-
Check for errors in Gitaly Dev Sentry -
Check for errors in GitLab Dev Sentry
Continue?
-
On unexpectedly high calls rates, error rates, CPU activity, etc, disable trial immediately with !feature-set gitaly_git_fsck false
in#dev-gitlab
otherwise leave running and proceed proceed to next stage.
3. Staging Trial
Check Staging Server Versions
staging.gitlab.com
Enable on -
!feature-set gitaly_git_fsck true
in#development
Then leave running while monitoring for at least 15 minutes while performing some testing through web, api or SSH.
Monitor (at least every 5 minutes, preferably real-time)
-
Monitor Grafana feature dashboard on staging: Gitaly Feature Status Dashboard -
Inspect logs in ELK: - Fsck invocations, last hour for unusual activity
- Fsck errors, last hour for unusual activity
-
Check for errors in Gitaly Staging Sentry -
Check for errors in GitLab Staging Sentry
Continue?
-
On unexpectedly high calls rates, error rates, CPU activity, etc, disable trial immediately using !feature-set gitaly_git_fsck false
in#development
otherwise leave running and proceed to next stage.
4. Production Server Version Check
-
Gitaly: Gitaly Production Version Tracker Dashboard -
GitLab: https://gitlab.com/help
5. Initial Impact Check
-
Create an issue in the infrastructure tracker: Create issue now -
Set Gitaly to 1% using the command !feature-set gitaly_git_fsck 1
in#production
Then leave running while monitoring for at least 15 minutes while performing some testing through web, api or SSH.
Monitor (at least every 5 minutes, preferably real-time)
-
Monitor Grafana feature dashboard on production: Gitaly Feature Status Dashboard -
Inspect logs in ELK: - Fsck invocations, last hour for unusual activity
- Fsck errors, last hour for unusual activity
-
Check for errors in Gitaly Sentry -
Check for errors in GitLab Sentry
Continue?
-
On unexpectedly high calls rates, error rates, CPU activity, etc, disable trial immediately with !feature-set gitaly_git_fsck false
in#production
otherwise leave running and proceed to next stage.
6. Low Impact Trial
-
Set Gitaly to 5% using the command !feature-set gitaly_git_fsck 5
in#production
Then leave running while monitoring for at least 2 hours.
Monitor (at least every 20 minutes)
-
Monitor Grafana feature dashboard on production: Gitaly Feature Status Dashboard -
Inspect logs in ELK: - Fsck invocations, last 2 hours for unusual activity
- Fsck errors, last 2 hours for unusual activity
-
Check for errors in Gitaly Sentry -
Check for errors in GitLab Sentry
Continue?
-
On unexpectedly high calls rates, error rates, CPU activity, etc, disable trial immediately with !feature-set gitaly_git_fsck false
in#production
otherwise leave running and proceed to next stage.
7. Mid Impact Trial
-
Set Gitaly to 50% using the command !feature-set gitaly_git_fsck 50
in#production
Then leave running while monitoring for at least 24 hours.
Monitor (at least every few hours)
-
Monitor Grafana feature dashboard on production: Gitaly Feature Status Dashboard -
Inspect logs in ELK: - Fsck invocations, last 24 hours for unusual activity
- Fsck errors, last 24 hours for unusual activity
-
Check for errors in Gitaly Sentry -
Check for errors in GitLab Sentry
Continue?
-
On unexpectedly high calls rates, error rates, CPU activity, etc, disable trial immediately with !feature-set gitaly_git_fsck false
in#production
otherwise leave running and proceed to next stage.
8. Full Impact Trial
-
Set Gitaly to 100% using the command !feature-set gitaly_git_fsck 100
in#production
Then leave running while monitoring for at least 1 week.
Monitor (at least every day)
-
Monitor Grafana feature dashboard on production: Gitaly Feature Status Dashboard -
Inspect logs in ELK: - Fsck invocations, last 7 days for unusual activity
- Fsck errors, last 7 days for unusual activity
-
Check for errors in Gitaly Sentry -
Check for errors in GitLab Sentry
Success?
-
Close this issue and mark the ~Conversation as ~"Migration:Opt-In"
Edited by Kim Carlbäcker