Skip to content

Stop tunnel tracker together with agent gRPC server

Relates to #386 (closed).

Screenshot_2023-08-03_at_7.39.51_am

On the graph above we can see that there is a clear correlation between kas scaling up or down. What's going on?

When kas receives a shutdown signal, it starts the shutdown sequence where it stops things in stages in reverse order it started them. The first things that are stopped are Kubernetes API proxy module and tunnel tracker. They are stopped concurrently.

  • Tunnel tracker unregisters all tunnels for this kas instance from Redis and then stops immediately. It also stops registering incoming tunnels so that they can not be found by other kas instances because we don't want new connections to this kas instance as it's shutting down.
  • Kubernetes proxy module may take quite a bit of time to stop. This can take over an hour (up to 1h15m), depending on how long it takes to drain the in-flight requests.

Only after the above two stop, agent gRPC server (the one that accepts all connections from agents) starts shutting down. gRPC uses HTTP/2 and HTTP/2 has a GOAWAY frame that is sent by the server to clients when it starts shutting down to tell them to stop opening new streams i.e. to stop making new RPCs i.e. stop using this TCP connection for "new traffic".

We have a situation here where an agent, connected to a kas that is shutting down, is not getting GOAWAY for quite a while because gRPC server its connected to is not shutting down. At the same time all reverse tunnels from that agent have been unregistered. Because it's not getting GOAWAY it's not reconnecting to a different kas. It may even reconnect to that same kas because it's still accepting connections while Kubernetes API proxy has in-flight requests.

The situation got worse as we fixed issues with long-running connections, such as Investigate kas long connection timeout (#349 - closed).


This MR changes code to stop tunnel tracker together with the agent gRPC server.

Edited by Mikhail Mazurskiy

Merge request reports

Loading