Respect the host-template annotation - do not serve traffic if it the value has changed for this annotation
MRs:
- handle k8s service updates properly (!1 - merged) (Fix)
- gitlab-org/gitlab!164215 (merged) (Doc Update)
Tag:
Description
There is a bug where k8s service
objects that the workspace proxy monitors, are not being properly updated. This leads to using stale data when attempting to route traffic to the appropriate workspace.
TLDR:
Expected behaviour: Gitlab workspace proxy respects service annotations for a workspace and routes traffic to the currently configured service hostnames which are based on the agent DNS zone.
Observed behaviour: Gitlab Workspace proxy does not respect service annotations and routes traffic to the incorrect service hostname for a workspace.
Current flow that includes bug
- workspace tools attach a callback to k8s client service to listen in on three k8s
service
objects eventsInformerActionAdd
,InformerActionUpdate
andInformerActionDelete
as seen here. - when a workspace is created, the service
InformerActionAdd
action is triggered and thetracker
parses the manifest tied to that service and then saves a mapping by hostname and workspace name to the service details which will be used for later in cluster traffic routing to workspace. This is the done inaddPorts
. - when the k8s agent config is updated to use a new DNS zone, the relevant workspace services have their
workspaces.gitlab.com/host-template
field updated. This causesaddPorts
to be run again leading to a new upstream host mapping being created even though it was just an annotation update (THIS IS THE BUG SOURCE). This can be confirmed and reproduced by checking container logs. - As a result of this, multiple upstream mappings exist for the same service leading to a previous hostname being accessible despite the k8s annotation update as demonstrated in this discussion.
Proposed fix:
Handle updates properly from the informer. The informer provides the old manifest when an update occurs but we currently do not use that in the callback that handles events. In the case of updates, we should invalidate the previous cached host mappings and register the new one.
Acceptance Criteria
-
The following discussion from gitlab-org/gitlab!137277 (merged) should be addressed:
Click to expand
@vtak started a discussion: (+1 comment)> I've tested the feature end to end and the following is what I've noticed
>
> - The DB migration and rollback work fine
> - Any change to the agent's dns_zone updates the corresponding dns_zone for the non-terminated workspace associated to the agent and the changes are propagated to annotations in the Kubernetes manifests
> ```
> ➜ gitlab-development-kit git:(main) ✗ kubectl get svc -o yaml
> apiVersion: v1
> items:
> - apiVersion: v1
> kind: Service
> metadata:
> annotations:
> config.k8s.io/owning-inventory: workspace-10-1-l2gb7g-workspace-inventory
> workspaces.gitlab.com/host-template: '{{.port}}-workspace-10-1-l2gb7g.workspaces2.localdev.me'
> ...
> ```
> - There is some caching about this annotation(`workspaces.gitlab.com/host-template`) in gitlab-workspaces-proxy. Check the attached video.
> - Context - The bottom 2 workspaces were created when the `dns_zone` was `workspaces.localdev.me`. These workspaces was accessed through the URL after creation. The dns_zone in the agent was then updated. A new workspace was created(top in the list) which has not yet been accessed through the URL
> - In the video you can see that opening any URL doesn't open up the workspace. This is expected because we've updated the dns_zone in agent configuration but the ingress pointing to `gitlab-workspaces-proxy` has not yet been updated. You can see the information by running `kubectl -n gitlab-workspaces get ingress gitlab-workspaces-proxy`. Since ingress resource has not been updated, any traffic to `*.workspace2.localdev.me` does not reach gitlab-workspaces-proxy`. The ingress is still pointing to `*.workspaces.localdev.me`. This is fine and understandable
> - When I open the last 2 workspaces after modifying the URL to point to old DNS zone, it opens up the editor even though the annotations are updated in the respective Kubernetes service(as shown above). This is a problem and we need to resolve it. gitlab-workspaces-proxy should respect the annotations on the service while redirecting traffic to it. Otherwise, it is "bypassing" our mental model. It should just return 400 because the Kubernetes service annotation does not match `workspaces.localdev.me` but it is instead `workspaces2.localdev.me`.
> - When I open the top workspace after modifying the URL to point to old DNS zone, it reaches gitlab-workspaces-proxy(Expected). But does not server the editor and rather returns 400. (right behaviour).
> - This is not a blocking problem for this MR but we should address it as soon as we can.
>
> ![Screen_Recording_2024-01-11_at_5.28.44_PM](/uploads/51b158a629c88cfa5cd2c59535cb8c2e/Screen_Recording_2024-01-11_at_5.28.44_PM.mov)
>
> LGTM from behavioural perspective. Once the blocking comments added above are addressed, we can move forward. Thanks @cwoolley-gitlab :clap:
-
Make sure no stale data exists in the upstream tracker map by properly handling updates