You don’t? Check out how services, load balancers and the majority of CNI actual...

johntash · on Aug 26, 2024

> You don’t?

No, why do yours? :D

If you're using cluster autoscaling with very small (or perfectly sized) nodes, I could see it being more of an issue on a busy cluster.

But even then, I wouldn't set up a database to auto-scale. A new node could get created, but it doesn't mean the db pods will be moved to it. They'd ideally stay in the same location. And on a really busy cluster, I'd prefer a separate node pool for stateful apps.

Using something like Stackgres makes it relatively painless to run postgres in k8s too, it handles setting up replicas and can do automatic failover.

p_l · on Aug 24, 2024

A lot of the CNI/load balancer stuff was added as band aid for applications that don't cooperate nicely with k8s.

Applications that act "native" and don't need a lot of the extras...

Well, they arguably mostly use just the scheduler then :D

dijit · on Aug 24, 2024

Wait? you can run kubernetes with no CNI? My clusters have never even been able to register nodes as healthy without one.

Maybe I’m doing it wrong?

p_l · on Aug 24, 2024

TL;DR - today the CNI itself is interface to network implementation, so you'd need a minimal one.

But you do not need a "complex" CNI. Originally k8s pretty much worked with assumption you can route few subnets in good old static way to the cluster and that's it, and it works with that kind of approach still - each node gets a /24, there's a separate shared /24 (or more) for services, etc.

The complexities came from the fact that a lot of places that wanted to deploy kubernetes couldn't provide such a simple network infrastructure to hosts, then later what was a workaround got equipped with various extra bells&whistles

dilyevsky · on Aug 25, 2024

I looked at Agones - the docs on architecture are non-existent but from their ops docs it looks like a crd extension on top of vanilla kubernetes to automate/simplify scheduling. What specifically in cni or its most popular implementations prevents long running connections in your opinion?

dijit · on Aug 26, 2024

First: they force kubernetes into a position where pods can’t be evicted.

Second: they use a version of node ports that bypasses CNI, so you directly connect to the process living on the node. This means there’s no hiccups with CNI if another node (or pod) gets unscheduled that had nothing to do with your process.

In most cases, web services will be fine with the kinds of hiccups I’m talking about (even websockets); however UDP streams will definitely lose data - and raw TCP ones may fail depending on the implementation.

dilyevsky · on Aug 27, 2024

What you're describing sounds like implementation bugs in the specific CNIs you've used not anything to do with the k8s networking design in general. At former gig I ran a geo-distributed edge with long, persistent connections over Cilium and we had no issues sustaining 12h+ RTMP connections while scaling/downscaling and rolling pods on the same nodes. I've consulted for folks who did RTP (for WebRTC) which is UDP-based also with no issues. In fact, where we actually had issues was cloud load-balancing infra which in a lot of cases is not designed for long-running streams...