Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Actually, now I remember (it was five years ago or about so - quite a while so my memory is blurry) it did that not once but twice. One time I diagnosed the issue - it was a simple conntrack table overflow, so I had to bump it up. Another time, I have no idea what was wrong - I just lost database connectivity, but I’m certain it wasn’t the application or the database but something in the infra.

Neither of these are k8s issues though. Where were you playing with `conntrack`? On the backplane?

The issues you describe here are issues you created for the most part. They are not issues people run into in production with k8s, I can assure you of that.

> I just lost database connectivity, but I’m certain it wasn’t the application or the database but something in the infra

Most likely something with your cloud provider that you did not understand fully and therefore blamed k8s, the thing you understood the least at the time.



> Neither of these are k8s issues though. Where were you playing with `conntrack`? On the backplane?

Yes, on the host (GKE-provisioned VPS) where the application container ran.

While it's certain this is not in K8s itself, I'm not really sure where to draw the line. I mean, IIRC, K8s relies on kernel's networking code quite a lot (e.g. kube-proxy is all about that), so... I guess it's not precisely clear if it's in or out of scope.

But either way, they're still certainly GKE issues, because the whole thing was provisioned as GKE K8s cluster, where I think I wasn't really supposed to actually SSH to individual nodes and do something there.

> The issues you describe here are issues you created for the most part. They are not issues people run into in production with k8s, I can assure you of that.

Entirely irrespective of K8s or anything else... people don't create weird issues for themselves in production? ;-) I honestly suspect making sub-optimal decisions and reaping their unintended consequences is one thing that makes us human :-) And I'm sure someone out there right now tries some weird stuff in production because they thought it would be a good idea. Maybe even with K8s (but not exactly likely - people hack on complex systems less than on simple systems).

By the way, if you say connectivity hiccups aren't a thing in production-grade K8s, I really wonder what kind of issues people run into?

> Most likely something with your cloud provider

I remember that node-to-node host communications had worked and database was responsive, but the container had connection timeouts, which is why I suspect it was something with K8s.

But, yes, of course, it's possible it wasn't exactly a K8s issue but something with the cloud itself - given that I don't know what was the problem back then, I can't really confirm or refute this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: