Support for environments with only AWS IMDSv2 enabled
Proposal
AWS is recommending that customers disable IMDSv1, for example: AWS EKS best practice advises disabling IMDSv1 for nodes and pods.
A number of components of GitLab rely in AWS IMDS, and it's not clear how many support IMDSv2.
Customers who deploy GitLab with IMDSv1 disabled are likely to have a bad experience as some functionality will work, some functionality will not.
It is likely to result in issues and tickets, and the broken functionality, such as IAM, will be identified as the issue, not the fact that the customer has disabled IMDSv1.
Purpose of this issue
This is a pseudo-epic, to act as a SSOT for IMDSv2 issues and to provide more context for the shift to IMDSv2.
A number of issues will be raised to specific engineering groups. Product managers may want to associate those issues to their own epics, so this issue has not been promoted to an epic.
Origin of this issue
Customer raised a ticket to troubleshoot cloud-native backups that weren't working. GitLab team members can read more in the ticket.
The reason is that they've disabled IMDSv1, and s3cmd
doesn't seem to support IMDSv2.
The customer has turned IMDSv1 back on, as they are concerned that they would keep finding functionality that won't work with IMDSv1 disabled.
What is IMDSv2
- Instance Metadata Service Version 2
- IMDS is the AWS API that's available at
169.254.169.254
- One use case is obtaining credentials in an environment that uses IAM.
For example, looking at the s3cmd code, it makes an HTTP connection to 169.254.169.254
, and then:
request('GET', "/latest/meta-data/iam/security-credentials/")
This will return JSON payload, and from that AccessKeyId
, SecretAccessKey
, and Token
can be extracted.
However, it looks like s3cmd
is only using IMDSv1, because the steps documented for using IMDSv2 are:
- obtain a session token with:
PUT "http://169.254.169.254/latest/api/token"
- include that token in
GET
requests to the instance metadata service
Details on availability
- IMDSv2 announced 2019-11-19 to address four vulnerabilities with IMDSv1
- Amazon EKS support for IMDSv2 was announced 2020-08-24
-
AWS EKS best practises:
- Recommend blocking pod access to IMDS to minimise permssions, as they can inherit the rights of the node profile. If they don't use host networking, this recommendation covers both IMDSv1 and IMDSv2.
- Recommend blocking node and pod access to IMDSv1
- 2022-02-11 URL has changed, and it looks like it's been re-written. A very quick look, a reference to preferring IMDSv2 and restricting access
Demand is likely to grow
A quick search identified a number of articles and posts recommending IMDSv1 be disabled. (List moved to a comment)
It took a few months for EKS support to be announced, and I found some other suggestions about other AWS components that didn't fully support IMDSv2 on various time frames.
However, as AWS advises turning off IMDSv1, it's only a matter of time before this becomes common practice and customers will be requiring full support.
What GitLab support exists?
- 2021-01 Docker machine. Issue: gitlab-org/ci-cd/docker-machine#15 (closed), MR: gitlab-org/ci-cd/docker-machine!49 (merged)
- 2020-11 Rails - Fog update to v3.6.7 Issue: #287816 (closed), MR (13.7): !48519 (merged)
What is the nature of this support?
If we have added support for IMDSv2, it would be useful to know what functionality in GitLab:
- Requires configuration to use IMDSv2, and if so only uses IMDSv2 (so customers can 'tick boxes' on which bits are switched over) or
- Uses IMDSv2 if available, falling back to v1 only if necessary.
- Can handle IMDSv1 being turned off.
Customers who want to follow best practise or are directed to to use IMDSv2 for compliance reasons don't want to scope GitLab's support by turning off IMDSv1 and seeing what breaks.
Adding a section in our documentation would be helpful - so customers can see what's supported, what isn't, what explicit configuration changes are needed.
Example:
In the spec code associated with the fog update it is commented:
# If IMDSv2 is disabled, we should still fall back to IMDSv1
This implies that Fog will automatically use v2 if available.
Functionality inventory
What functionality do we have in GitLab, in the broadest sense, which uses IMDSv1 to obtain IAM credentials, or uses IMDS for anything else, and so would be impacted if IMDSv1 were disabled, per AWS best practise.
description | status | group | scoping issue | resolve - issue/MR |
---|---|---|---|---|
Fog (Rails) | fixed | ~"group::ecosystem" | n/a | MR (13.7): !48519 (merged) |
Docker Machine Executor | fixed | grouprunner | n/a | MR (13.9?): gitlab-org/ci-cd/docker-machine!49 (merged) |
Helm backups (s3cmd) | fixed | groupdistribution | n/a | gitlab-org/charts/gitlab#2787 |
Runners - shared cache | unknown | grouprunner | gitlab-runner#28027 | TBC |
Runners - uploading artifacts | unknown | grouprunner | gitlab-runner#28027 | TBC |
Fargate runner executor | unknown | grouprunner | gitlab-runner#28027 | TBC |
Kubernetes executor | unknown | grouprunner | gitlab-runner#28027 | TBC |
Using ECR for runner containers | unknown | grouprunner | gitlab-runner#28027 | TBC |
Container registry (S3 storage) | OK | ~"group::package" | #334890 (closed) | |
Dependency proxy | OK | ~"group::package" | #334890 (closed) | TBC |
Backups to object storage | OK | groupgeo | #334891 | |
Deploying Lambda functions. | OK | ~"group::configure" | #334894 (closed) (feature removed) | |
Kubernetes agent | Doesn't apply | ~"group::configure" | #334894 (comment 615371261) | TBC |
Continuous Deployment to AWS Elastic Container Service | unknown | groupenvironments | #334895 | TBC |