I can't imagine a worse combination than Kubernetes and stateful connections.

Joker_vD · on Aug 24, 2024

It only hurts when you actually have meaningful load and then suddenly needs to switch. Especially if the "servlets" that those stateful connections are connected to require some heavy-ish work on startup, so you're vulnerable to the "thundering herd" scenario.

But the author only uses it to keep alive a couple of IRC connections (which don't send you history or anything on re-connects) and to automatically backup their "huge" chat logs (seriously, 5 GiB is not huge, and if it's text then it can be compressed down to about 2 GiB — unless it's already compressed?).

dilyevsky · on Aug 25, 2024

You dont have to roll all the pods at the same time - there are built-in controls to avoid doing that and it’s the default. You will have to diy this if you’re using something else so, in fact, tp is wrong that k8s is somehow a bad fit for this use case

Joker_vD · on Aug 25, 2024

> You dont have to roll all the pods at the same time

That's not really the problem — if, say, one of your nodes drops dead (or just drops off the network), the clients' connections also drop, and they all try to reconnect. That just happens and there is not much you can do to prepare for it except by having some idle capacity already available.

Unless you're talking about rolling out strategies for deployment updates, and to be fair I don't remember the controls for that being all that useful but that was 2 years ago, so perhaps things are better now.

dilyevsky · on Aug 25, 2024

Having idling capacity is standard industry practice though. A secondary node in a primary/secondary setup of a typical monolith design is basically idle capacity except more expensive because it’s 100% over-provisioning which is not required with k8s

johntash · on Aug 24, 2024

It's only a problem if your nodes go up/down often, or you have other things causing pods to be pre-empted/etc.

If you have a static number of nodes and don't have to worry too much about things autoscaling, I don't see why it couldn't be really stable?

dijit · on Aug 24, 2024

You don’t?

Check out how services, load balancers and the majority of CNI actually work then.

Kubernetes was designed for stateless connections and it shows in many places.

If you want it to do stateful connections you could use something like Agones which intentionally bypasses a huge amount of kubernetes to use it only as a scheduler essentially.

johntash · on Aug 26, 2024

> You don’t?

No, why do yours? :D

If you're using cluster autoscaling with very small (or perfectly sized) nodes, I could see it being more of an issue on a busy cluster.

But even then, I wouldn't set up a database to auto-scale. A new node could get created, but it doesn't mean the db pods will be moved to it. They'd ideally stay in the same location. And on a really busy cluster, I'd prefer a separate node pool for stateful apps.

Using something like Stackgres makes it relatively painless to run postgres in k8s too, it handles setting up replicas and can do automatic failover.

p_l · on Aug 24, 2024

A lot of the CNI/load balancer stuff was added as band aid for applications that don't cooperate nicely with k8s.

Applications that act "native" and don't need a lot of the extras...

Well, they arguably mostly use just the scheduler then :D

dijit · on Aug 24, 2024

Wait? you can run kubernetes with no CNI? My clusters have never even been able to register nodes as healthy without one.

Maybe I’m doing it wrong?

p_l · on Aug 24, 2024

TL;DR - today the CNI itself is interface to network implementation, so you'd need a minimal one.

But you do not need a "complex" CNI. Originally k8s pretty much worked with assumption you can route few subnets in good old static way to the cluster and that's it, and it works with that kind of approach still - each node gets a /24, there's a separate shared /24 (or more) for services, etc.

The complexities came from the fact that a lot of places that wanted to deploy kubernetes couldn't provide such a simple network infrastructure to hosts, then later what was a workaround got equipped with various extra bells&whistles

dilyevsky · on Aug 25, 2024

I looked at Agones - the docs on architecture are non-existent but from their ops docs it looks like a crd extension on top of vanilla kubernetes to automate/simplify scheduling. What specifically in cni or its most popular implementations prevents long running connections in your opinion?

dijit · on Aug 26, 2024

First: they force kubernetes into a position where pods can’t be evicted.

Second: they use a version of node ports that bypasses CNI, so you directly connect to the process living on the node. This means there’s no hiccups with CNI if another node (or pod) gets unscheduled that had nothing to do with your process.

In most cases, web services will be fine with the kinds of hiccups I’m talking about (even websockets); however UDP streams will definitely lose data - and raw TCP ones may fail depending on the implementation.

dilyevsky · on Aug 27, 2024

What you're describing sounds like implementation bugs in the specific CNIs you've used not anything to do with the k8s networking design in general. At former gig I ran a geo-distributed edge with long, persistent connections over Cilium and we had no issues sustaining 12h+ RTMP connections while scaling/downscaling and rolling pods on the same nodes. I've consulted for folks who did RTP (for WebRTC) which is UDP-based also with no issues. In fact, where we actually had issues was cloud load-balancing infra which in a lot of cases is not designed for long-running streams...