It's barely usable by itself but I don't think it's an inherent problem of
seccomp-bpf, rather the lack of libc support. Surely the task of "determine
which syscalls are used for feature X" belongs in the software that decides which
syscalls to use for feature X.
The "what does the equivalent of pledge(stdio) actually mean?" doesn't have to actually be on the kernel side. But it's complicated by the fact that on Linux, syscalls can be made from anywhere. On OpenBSD syscalls are now only allowed from libc code.
So even if one uses Cosmopolitan libc, if you link to some other library that library may also do direct syscalls. And which syscalls is does, and under which circumstances, is generally not part of the ABI promise. So this can still break between semver patch version upgrades.
Like if a library used to just not write debug logs by default, but then changed so that they are written, but to /dev/null, then there's no way to inform application code for that library, much less update it.
If you ONLY link to libc, then what you said will work. But if you link to anything else (including using LD_PRELOAD), then all bets are off. And at the very least you'll also be linking to libseccomp. :-)
If libc were the only library in existence, then I'd agree with your 100%.
> So even if one uses Cosmopolitan libc, if you link to some other library
that library may also do direct syscalls. And which syscalls is does, and
under which circumstances, is generally not part of the ABI promise. So this
can still break between semver patch version upgrades.
Well but isn't that a more general problem with pledge? I can link to
libfoo, drop rpath privileges, and it'll work fine until libfoo starts
lazily loading /etc/fooconf (etc.)
A nice thing about pledge is that it's modularized well enough so such
problems don't occur very often, but I'd argue it's not less common of an
issue than "libfoo started doing raw syscalls." The solution is also the
same: a) ask libfoo not to do it, or b) isolate libfoo in an auxiliary
process, or c) switch to libbar.
> And at the very least you'll also be linking to libseccomp. :-)
libseccomp proponents won't tell you this, but you can in fact use seccomp
without libseccomp, as does Cosmopolitan libc. All libseccomp does is
abstract away CPU architecture differences, which a libc already has to do
by itself anyway.
No, for two reasons: 1) pledge() lets you give high level "I just want to do I/O on what I already have", and it doesn't matter if new syscalls "openat2" (should be blocked) or "getrandom" (should be allowed) are created. (see the `newfstatat` example on printf). And 2) OpenBSD limits syscalls to be done from libc, and libc & kernel are released together. Other libs need to go through libc.
Yes, if libfoo starts doing actual behavioral changes like suddenly opening files, then that's inherently indistinguishable from a compromised process. But I don't think that we need to throw out the baby with that bathwater.
And it's not just about libfoo doing raw syscalls. `unveil()` allows blocking off the filesystem. And it'll apply to open, creat, openat, openat2, unlink, io_uring versions of the relevant calls (if OpenBSD had it), etc…
But yes, if libc could ship its best-effort pledge()/unveil(), that also blocks any further syscalls (in case the kernel is newer), that'd be great. But this needs to be part of (g)libc.
Though another problem is that it doesn't help child processes with a statically compiled newer libc, that quite reasonably wants to use the newer syscalls that the kernel has. OpenBSD decided to simply not support statically linked libc, but musl (and Cosmopolitan libc?) have that as an explicit goal.
So yeah, because they mandate syscalls from libc, ironically OpenBSD should have been able to make pledge/unveil a libc feature using a seccomp-like API, or hell, implemented entirely in user space. But Linux, which has that API, kinda can't.
(ok, so I don't know how strictly OpenBSD mandates the exact system libc, so maybe what I just said would open a vulnerability)
> 1) pledge() lets you give high level "I just want to do I/O on what I
already have", and it doesn't matter if new syscalls "openat2" (should be
blocked) or "getrandom" (should be allowed) are created. (see the
`newfstatat` example on printf).
You can do this with seccomp if you're libc. A new syscall is of no
consequence for the seccomp filter unless libc starts using it, in which
case libc can just add it to the filter. (Of course the filter has to be an
allow-list.)
> And 2) OpenBSD limits syscalls to be done from libc, and libc & kernel are
released together. Other libs need to go through libc.
That avoids one failure mode, but I think you assign too much importance to
it. If your dependency uses a raw syscall (and let's be honest this isn't
that common), you'll see your program SIGSYS and add it manually.
If you have so many constantly changing dependencies that you can't
tell/test which ones use raw syscalls and when, you have no hope of
successfully using pledge either.
> But I don't think that we need to throw out the baby with that bathwater.
We agree here, just not on which baby :)
> And it's not just about libfoo doing raw syscalls. `unveil()` allows
blocking off the filesystem.
You're right, seccomp is unsuitable for implementing unveil because it can't
inspect contents of pointers. I believe Cosmopolitan uses Landlock for it.
> Though another problem is that it doesn't help child processes with a
statically compiled newer libc
If you're trying to pledge a program written by somebody else, expect
problems on OBSD too because pledge was not designed for that. (It can work
in many cases, but that's kind of incidental.)
If it's your own program, fine, but that means you're compiling your binaries
with different libcs and then wat.
> So yeah, because they mandate syscalls from libc, ironically OpenBSD
should have been able to make pledge/unveil a libc feature using a
seccomp-like API, or hell, implemented entirely in user space. But Linux,
which has that API, kinda can't.
My take is "it can, with caveats that don't matter in 99% the cases pledge
is useful in." (Entirely in user space no, with seccomp yes.)
But only in very small sandboxes, right? Yes, seccomp could potentially be used for your JIT/interpreter sandbox. And because it inherently executes untrusted input, that's definitely the most important place.
But compare how many applications execute untrusted remote programs to how many programs that have had security vulnerabilities. Or indeed, how much code.
What percentage of code runs in chrome/firefox's sandbox? 0.0001%?
Have you tried to create a seccomp ruleset for a real program? I have. There are too many variations between machines and code paths that you'll necessarily need to leave wide open doors through your policy. Sure, the more you disable the "luck" you manufacture in case of a bug, preventing exploitation. But no, it's not fit for purpose outside these extremely niche use cases.
Linux is far too bloated to ve run as a secure system and the attack surface of any linux distro, due to the number of kernel modules loaded by default, is very big.
Seccomp was never actually usable: https://blog.habets.se/2022/03/seccomp-unsafe-at-any-speed.h...