Websockets in Kubernetes

2024-05-24 8 min

Recently I’ve been working on a legacy service that was migrated from AWS Lambda to Kubernetes. Let’s call the service File Validator.

The File Validator service performs a series of steps to validate the format of files sent through the web client. The way the service communicates validation errors is via a Websocket.

I assume that the decision to use a Websocket was chosen at the time for two main reasons:

  1. To perform the file processing in the background. The processing can take a few seconds or even a minute to perform the validation steps.
  2. And also to avoid storing validation errors. No additional storage.

So, probably the most reasonable way to return the errors was via a Websocket.

Some time later, the entire architecture was migrated to Kubernetes. Gradually, services were adapted to the new infrastructure. But no other services were using Websockets. So when it came time for the File Validator service, some work had to be done to port the Websocket response without making a major change.

I’m not going into details of the implemented solution, but let’s say the Websocket approach was still in place. And it’s here when the problems came.

The problem

One of the key aspects of Kubernetes is to manage more than one instance (from now on pods) of a service. There can be two pods or more deployed to manage incoming requests through an ingress server.

In this context, the File Validator service cannot have more than one pod deployed. When there are multiple pods, the Websocket connection is not preserved when a reconnection takes place. The incoming server is not able to redirect to the pod that initiated the Websocket session. The following diagram illustrates this problem.


So far the File Validator did not require too many resources. The workload was holding up nicely with some vertical scaling. But the company’s expected high growth will lead to the File Validator service becoming a likely bottleneck: the workload will increase so the non-scalability of the asset will be a real problem sooner or later.

Sticky sessions

It is common during the lifespan of a Websocket the connection is interrupted and connected again, several times. To make this work appropriately, the client must keep calling the same pod that initiated the session until the session is closed.

So, the main problem is how to maintain Websockets connections calling the same pod in a multi-node environment like Kubernetes. The answer is sticky sessions.

For short, this is a technique that allows the ingress server to maintain a persistent connection between the client and the pod that initiated the connection. So, the changes must be applied applied at the ingress server.

💡 Note that the sticky session method is not unique to Kubernetes. It is a pattern also called session persistence that emerged with the advent of multi-node architectures.

There are two ways how sticky sessions can be achieved: by IP and Cookies.

IP based

The IP solution maintains the session based on the client’s IP address. Thus, all connections coming from the same IP will be redirected to the same pod.

The modifications required for the input controller are as follows.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: your-ingress
  namespace: your-namespace
  annotations:
    nginx.ingress.kubernetes.io/configuration-snippet: |
      set $forwarded_client_ip "";
      if ($http_x_forwarded_for ~ "^([^,]+)") {
        set $forwarded_client_ip $1;
      }
      set $client_ip $remote_addr;
      if ($forwarded_client_ip != "") {
        set $client_ip $forwarded_client_ip;
      }
    nginx.ingress.kubernetes.io/upstream-hash-by: "$client_ip"

The benefit over the cookie method is that there is no need to rely on the client. It is the ingress server that maintains persistence through the variable specified in nginx.ingress.kubernetes.io/upstream-hash-by ↗️.

On the other hand, using the IP address as the session ID does not guarantee the real IP of the targeted device, but perhaps the IP of the proxy server that serves as a gateway for an internal network. This could cause an entire network only to connect to one pod, and therefore losing the load balancing functionality for that particular connection.

The script specified in the nginx.ingress.kubernetes.io/configuration-snippet annotation minimizes the impact for these cases. In particular, the script extracts the first IP address in case the request has traveled through upstream reverse proxies or API gateways.

On the other hand, a better method is to use cookies. As the name implies, the session is stored on the client via a cookie. This method ensures a “real” 1-to-1 link between the client and the server compared to storing the IP on the ingress server.

The ingress annotations look as follows.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: your-ingress
  namespace: your-namespace
  annotations:
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "sticky-session"
    nginx.ingress.kubernetes.io/session-cookie-expires: "60000"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "60000"
		

These annotations allow to store a cookie named sticky-session for 1 minute.

One way to set the TTL (Time To Live) can be based on the longest Websocket connection. This way no connection will be lost and the client will be able to connect to other pods after the Websocket request has been processed.

In my case, 1 minute was more than enough.

Different domains

Another important point to keep in mind when using the cookie method is that cookies will not be shared between different domains. This is an expected CORS security constraint.

Both the client and the server must be configured to share credentials between domains.

The client application must set the withCredentials option to true ↗️.

const socket = io("https://server-domain.com", { withCredentials: true });

The same change must be made on the server and also add the domain source in the whitelist. Note that the wildcard * won’t work.

const io = SocketIO(httpServer, {
  cors: {
    origin: ['https://client-domain.co'],
    credentials: true,
  },
 )

Keep in mind that this configuration is for a Websocket server, but the same applies for an HTTP server.

Constraints

There are always some constraints with whatever approach chosen. It is no less with sticky session in Kubernetes.

Whichever solution is chosen, there are always some tradeoffs, and this is no less true for sticky sessions in Kubernetes.

Sticky sessions introduce a certain degree of dependency between the client and the pod that initiated the connection, which is contrary to the principles of Kubernetes: Kubernetes removes and creates pods to meet the needs of different workloads at each point in time. This is why stateful endpoints are not particularly well-suited for this architecture. Specifically because of the shutdown policy.

The shutdown policy is the grace time a pod has to complete current workloads until it is kicked out. During this time pods will not receive any more requests.

It is important that during pod eviction the request is completed before the pod is removed.

Otherwise if the workload is longer than the grace period, the workload status should be temporarily stored so that another pod can take over and complete the work.

Conclusion

Before making a premature decision and taking into account all important aspects:

In my particular case the Websocket connection closes relatively fast (20 seconds on average) that there is no need for session storage when the pod is evicted. The shutdown grace time is one minute, more than enough time to handle the workload.

Therefore, in this case, it is safe to use sticky sessions for Websockets, at least for a short, mid, or even a long-term solution.

Future considerations

Even if the sticky session is a valid solution to keep Websocket connections properly open, regarding the stateless nature of Kubernetes, a proper solution would be a more RESTful approach replacing the Websocket by a stateless HTTP request.

The client initiates the workload with an HTTP request and the pod sends the workload to a background job. The client now requests with a polling mechanism for to any pod available. Once the background job is completed, the result is stored temporarily and handled to the client by the pod that gets the last HTTP request. And that’s all. No session involved.

So, all things considered, in my case sticky sessions showed a better ROI. But one thing is for sure: stateful requests should be avoided in order to better align with the ephemeral philosophy of Kubernetes.