Can't disagree, but still it feels wrong, doesn't it?
If we consider instead filesystem space exhaustion, it's easier to imagine a reasonable expectation that the system could enter a degraded service state (e.g. reads work but writes don't), then recover without restart.
In fact I've worked on products that expected to provide exactly that kind of behavior, and we (tried to) test it.
If you enter file system exhaustion (or a read only file system), most applications should fail similarly. You want to be very very careful to only add error handling for known failure modes you know, expect, have seen in production, and have test coverage for. Everything else should fail as gracefully as possible. Anything else is “safety theater” where you feel better but it’s totally illusory.
So yes, some applications should handle a file system full failure. I’ve written code where the file system code was problematic (an infinite loop causing 100% cpu usage is common because it’s a daemon). I still will take that because it was caught so early in testing and we had a clear repro case to go after to reproduce the problem, fix it etc.
The other huge difference is that memory allocation is everywhere. Technically, everything should be passing a memory allocator if the function might allocate. After all, maybe I want my container to allocate the next set of nodes somewhere else or at the very least I want to use a pool allocator for some scope of code. In practice, this too is too much even though it’s infinitely more practical and useful than failable allocations.
TLDR: memory allocations are qualitatively different enough because it’s hidden global state that’s used everywhere. If you’re not going to have a memory allocator passed into every function as an override able parameter, faillable allocations aren’t practically interesting. This was a lesson learned about a decade ago but it seems like there’s pockets in the community that still disagree and push back against this. I wish them luck.
Zig is still relatively immature so it’s hard to judge just quite what software patterns will look like at scale, especially when it leaves the niche of language enthusiasts using it to build stuff.
Anyway, [1] shows the pros of this approach. But notice what’s not being discussed. This acknowledges that you have effectively increased the number of error paths in your program and you have to go through quite a bit of effort to get coverage to get some confidence of the behavior. Think of it in terms of ROI. Testing for malloc failures is comparatively expensive. It’s not logic based so you have to do it probabilistically and hope your test coverage works well and the issue happens infrequently (really never) in production. If it happens never, you’ve wasted all that test effort. If it happens very rarely, you’ve still wasted the effort because the probability that your test coverage accurately simulated the failure path is probably fairly low in a common path. It also sounds like zig’s allocator is actually quite dumb.
So while making sure to require explicit allocators, what’s a missed opportunity I think:
1. The default program allocator should be passed into main / dlinit. It’s not. This seems like a bad choice.
2. All malloc code in practice, I suspect, ends up getting propagated with try. That means you’re paying a performance cost for all the error checks that never get executed in practice (mostly in terms of icache pressure). Compare with rust where allocation failures typically panic and an optimized application will have panic set to abort leaving unwinding to a background process responsible for error reporting. That’s means there’s no stack unwinding and no error handling code for malloc failures without any change in safety guarantees.
Zig is a neat experiment and I understand why it’s proponents like it. I still would never choose it in production at this time:
1. The safety issues means that it’s not thaaat different in profile from C. It’s significantly easier to write safe code, but it’s not easier to audit / make guarantees.
2. Development speed is higher than some thing like Rust but when performance is a bit less critical higher level GC languages (Kotlin, Go etc) seem to make more sense.
So it’s hard to see what niche of professional development Zig can fill. Now being a hobbyist language is fine and it’s doing a lot of interesting things to show the ergonomics of other choices we wouldn’t otherwise see. But I would personally be very careful about making assumptions like “it seems to be working well enough for Zig” until it gets into the hands of a broader community. Things that seem to work OK earlier can shift drastically at scale.
If we consider instead filesystem space exhaustion, it's easier to imagine a reasonable expectation that the system could enter a degraded service state (e.g. reads work but writes don't), then recover without restart.
In fact I've worked on products that expected to provide exactly that kind of behavior, and we (tried to) test it.