Finally Linux has something that approaches pledge/unveil: landlock. Seccomp was...

shiomiru · 2025-11-16T14:32:17 1763303537

> Seccomp was never actually usable

It's barely usable by itself but I don't think it's an inherent problem of seccomp-bpf, rather the lack of libc support. Surely the task of "determine which syscalls are used for feature X" belongs in the software that decides which syscalls to use for feature X.

In fact, Cosmopolitan libc implements pledge on Linux on top of seccomp-bpf: https://justine.lol/pledge/

thomashabets2 · 2025-11-17T13:52:04 1763387524

Well, kinda.

The "what does the equivalent of pledge(stdio) actually mean?" doesn't have to actually be on the kernel side. But it's complicated by the fact that on Linux, syscalls can be made from anywhere. On OpenBSD syscalls are now only allowed from libc code.

So even if one uses Cosmopolitan libc, if you link to some other library that library may also do direct syscalls. And which syscalls is does, and under which circumstances, is generally not part of the ABI promise. So this can still break between semver patch version upgrades.

Like if a library used to just not write debug logs by default, but then changed so that they are written, but to /dev/null, then there's no way to inform application code for that library, much less update it.

If you ONLY link to libc, then what you said will work. But if you link to anything else (including using LD_PRELOAD), then all bets are off. And at the very least you'll also be linking to libseccomp. :-)

If libc were the only library in existence, then I'd agree with your 100%.

shiomiru · 2025-11-17T15:34:52 1763393692

> So even if one uses Cosmopolitan libc, if you link to some other library that library may also do direct syscalls. And which syscalls is does, and under which circumstances, is generally not part of the ABI promise. So this can still break between semver patch version upgrades.

Well but isn't that a more general problem with pledge? I can link to libfoo, drop rpath privileges, and it'll work fine until libfoo starts lazily loading /etc/fooconf (etc.)

A nice thing about pledge is that it's modularized well enough so such problems don't occur very often, but I'd argue it's not less common of an issue than "libfoo started doing raw syscalls." The solution is also the same: a) ask libfoo not to do it, or b) isolate libfoo in an auxiliary process, or c) switch to libbar.

> And at the very least you'll also be linking to libseccomp. :-)

libseccomp proponents won't tell you this, but you can in fact use seccomp without libseccomp, as does Cosmopolitan libc. All libseccomp does is abstract away CPU architecture differences, which a libc already has to do by itself anyway.

(In my project, I got annoyed enough by the kernel header dependency that I just replaced libseccomp with a shell script: https://codeberg.org/bptato/chawan/src/commit/cad5664fc0aa10... although this might have gotten me a place reserved in hell.)

thomashabets2 · 2025-11-17T16:46:08 1763397968

> isn't that a more general problem with pledge?

No, for two reasons: 1) pledge() lets you give high level "I just want to do I/O on what I already have", and it doesn't matter if new syscalls "openat2" (should be blocked) or "getrandom" (should be allowed) are created. (see the `newfstatat` example on printf). And 2) OpenBSD limits syscalls to be done from libc, and libc & kernel are released together. Other libs need to go through libc.

Yes, if libfoo starts doing actual behavioral changes like suddenly opening files, then that's inherently indistinguishable from a compromised process. But I don't think that we need to throw out the baby with that bathwater.

And it's not just about libfoo doing raw syscalls. `unveil()` allows blocking off the filesystem. And it'll apply to open, creat, openat, openat2, unlink, io_uring versions of the relevant calls (if OpenBSD had it), etc…

But yes, if libc could ship its best-effort pledge()/unveil(), that also blocks any further syscalls (in case the kernel is newer), that'd be great. But this needs to be part of (g)libc.

Though another problem is that it doesn't help child processes with a statically compiled newer libc, that quite reasonably wants to use the newer syscalls that the kernel has. OpenBSD decided to simply not support statically linked libc, but musl (and Cosmopolitan libc?) have that as an explicit goal.

So yeah, because they mandate syscalls from libc, ironically OpenBSD should have been able to make pledge/unveil a libc feature using a seccomp-like API, or hell, implemented entirely in user space. But Linux, which has that API, kinda can't.

(ok, so I don't know how strictly OpenBSD mandates the exact system libc, so maybe what I just said would open a vulnerability)

shiomiru · 2025-11-17T19:33:41 1763408021

> 1) pledge() lets you give high level "I just want to do I/O on what I already have", and it doesn't matter if new syscalls "openat2" (should be blocked) or "getrandom" (should be allowed) are created. (see the `newfstatat` example on printf).

You can do this with seccomp if you're libc. A new syscall is of no consequence for the seccomp filter unless libc starts using it, in which case libc can just add it to the filter. (Of course the filter has to be an allow-list.)

> And 2) OpenBSD limits syscalls to be done from libc, and libc & kernel are released together. Other libs need to go through libc.

That avoids one failure mode, but I think you assign too much importance to it. If your dependency uses a raw syscall (and let's be honest this isn't that common), you'll see your program SIGSYS and add it manually.

If you have so many constantly changing dependencies that you can't tell/test which ones use raw syscalls and when, you have no hope of successfully using pledge either.

> But I don't think that we need to throw out the baby with that bathwater.

We agree here, just not on which baby :)

> And it's not just about libfoo doing raw syscalls. `unveil()` allows blocking off the filesystem.

You're right, seccomp is unsuitable for implementing unveil because it can't inspect contents of pointers. I believe Cosmopolitan uses Landlock for it.

> Though another problem is that it doesn't help child processes with a statically compiled newer libc

If you're trying to pledge a program written by somebody else, expect problems on OBSD too because pledge was not designed for that. (It can work in many cases, but that's kind of incidental.)

If it's your own program, fine, but that means you're compiling your binaries with different libcs and then wat.

> So yeah, because they mandate syscalls from libc, ironically OpenBSD should have been able to make pledge/unveil a libc feature using a seccomp-like API, or hell, implemented entirely in user space. But Linux, which has that API, kinda can't.

My take is "it can, with caveats that don't matter in 99% the cases pledge is useful in." (Entirely in user space no, with seccomp yes.)

nolist_policy · 2025-11-17T09:20:13 1763371213

Chrome and Firefox use seccomp for sandboxing since more that 15 years: https://lwn.net/Articles/346902/

thomashabets2 · 2025-11-17T14:11:40 1763388700

But only in very small sandboxes, right? Yes, seccomp could potentially be used for your JIT/interpreter sandbox. And because it inherently executes untrusted input, that's definitely the most important place.

But compare how many applications execute untrusted remote programs to how many programs that have had security vulnerabilities. Or indeed, how much code.

What percentage of code runs in chrome/firefox's sandbox? 0.0001%?

Have you tried to create a seccomp ruleset for a real program? I have. There are too many variations between machines and code paths that you'll necessarily need to leave wide open doors through your policy. Sure, the more you disable the "luck" you manufacture in case of a bug, preventing exploitation. But no, it's not fit for purpose outside these extremely niche use cases.

pjmlp · 2025-11-17T08:37:57 1763368677

Seccomp is heavily used on Android.

hulitu · 2025-11-16T18:08:16 1763316496

Linux is far too bloated to ve run as a secure system and the attack surface of any linux distro, due to the number of kernel modules loaded by default, is very big.

miladyincontrol · 2025-11-17T06:27:19 1763360839

And yet, countless companies do just fine.