Geo: Document Runner behavior on failover
Problem to solve
Document Runner behavior on failover.
From an internal Slack conversation https://gitlab.slack.com/archives/C32LCGC1H/p1611079853037100:
If there is a failover, and the Geo URL is updated to be the original primaries will runners re-attach or do they need to be registered again?
they will just retry to the same URL, once that points to the new primary, they will connect there as expected, as long as the gitlab-secrets.json were the same (=> the runner tokens didn't change and are decryptable)
Further details
Proposal
We should cover the following:
- What happens to runners when a secondary is promoted.
- Do they need to re-register with the new primary? Is there registration automatic?
- Do secrets need to be updated?
- How long will the runners take to become available after a failover?
- What happens if the runners are in the same DC as the primary that failed?
- Best practice on how to configure runners to cope with failover event.
- Should runners be located in the same DC as the primary? Should they be located in a different DC or do we recommend a mixture?
- What happens to runners that were accelerated by Geo secondaries following a failover event - when using Unified URL and separate secondary URLs?
Who can address the issue
Other links/references
Edited by Sampath Ranasinghe