Add command for base certificates regeneration
Related to https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/13795
What critical bug this MR is fixing?
How does this change help reduce cost of usage? What scale of cost reduction is it?
In what scenarios is this change usable with GitLab Runner's docker+machine executor?
It addresses the known problem related to docker+machine
executor.
In short:
Docker Machine uses TLS Client certificates authentication to authenticate Docker Client requests against the Docker Engine API running on the created machine. Initially, the certificates don't exist. They are created upon the first docker-machine create
call. Also in case when the certificate (CA and/or the client certificate) got outdated, they are regenerated upon the docker-machine create
call.
This however creates a problem in a highly concurrent setups, which is mostly the case of docker+machine
executor. As then multiple concurrently running docker-machine create
commands are trying to create/recreate the keys and certificates. Which creates a race conditions and in most times causes the Runner to enter a machines creation failing loop (as the certificates can't be created or concurrent updates created certificate mismatches).
Unfortunately, there is no way to generate new or regenerate outdated certificates out of the scope of docker-machine create
. There is a docker-machine regenerate-certs
command, but it requires an existing machine to be used for that which is a really strange design choice.
This MR addresses that problem. It adds docker-machine regenerate-base-certs
command, which uses Docker Machine internals to generate/regenerate required certificates but only the base ones (so the CA and the client certificate). It also doesn't require any existing machine to be used for that.
With this change, any orchestrating around GitLab Runner (for example - our Chef cookbook that we use to manage GitLab.com runner managers fleet) can be updated to run docker-machine regenerate-base-certs
frequently. Which will have two effects:
- on new runner manager hosts - it will generate the initial keys and certificates, after Docker Machine installation,
- on existing runner manager hosts - it will regenerate the certificates that are already outdated.
The --regenerate-before
flag allows to define an earlier regeneration than what Not After
, which allows the externally managed regeneration (for example the chef-based one) to be done before the certificates are really outdated. This will prevent falling into the same race-condition with the docker-machine create
calls.