> This hypothesis doesn’t make much sense but I hadn’t realized that the root user in a Docker container is the same as the root user on the host, so I thought that was interesting
Wait what? I didn't know that. This sounds terrible. Is the root in the container the same user who runs the container, or the same root who is root on the host machine?
Without User Namespaces, if you run a container, user IDs in the container use the same pool as the host. So if your container runs as UID 0 / root, it’s the same UID 0 / root as your host.
This is one of the many reasons why giving non-root users access to the Docker daemon (so they can start containers) is dangerous: if, as a non-root user, I can start a container that’s running as UID 0, there’s a lot of possibility for misuse.
User Namespaces enable Docker to use separate UID pools for the containers, which enables a container to run as “UID 0 / root” but have the host actually map that as some arbitrary other UID, so the host can treat it differently than actual UID 0.
And not only for running processes, but also for files on the disk. If you mount an image on the host and make a file owned by "root", then mount that image in a container, the container's "root" now owns it too.
Edit: Note that userids are the same, but usernames are not managed by the kernel (handled by /etc/passwd or LDAP or something), so userid 1001 inside the container is userid 1001 outside, but they might have different names if you "ls -l" from different places.
Or, years before, NFS: the NFS permissions model trusts the client and server implicitly. In many environments, that meant that getting root[1] on any system quickly cascaded across all of them if you could write to, say, someone's shell profile, SSH keys, or a shared binary in a common location — which was not uncommon at all when people were trying to conserve storage costs by only installing things in one place. No suid, nodev, and the various options for preventing uid=0 access were all attempts to bandaid around the lack of a better authentication option until people started switching to Kerberos.
Docker doesn't use user namespaces by default. They have been working on adding support for a while, but it makes permissions on volumes difficult, and things kept changing in kernel land for years.
Root inside a non-userns container is the same as root on the Linux host, but it is constrained by security policies like seccomp and apparmor.
Yeah much easier to run everything as root. Also, we don't need /etc/password and stuff for UID/GID resolution and credentials in the container anyway; we just use ad-hoc auth and crypto from a random third-party lib (that isn't vetted, is never updated, runs in the same address space as your app, and hasn't access to meaningful entropy since running in a container) and supply root credentials on the docker command line or the Dockerfile checked in to github, or both.
That's what we've been saying for years: Docker doesn't solve anything, it merely hides problems from you (and helps your cloud provider's bottom line). Good luck, and yes, PHBs should be worried for civil/criminal gross negligence if the shit hits the fan. Your cloud provider is happy to take your checks, but will shrug-away and point out they're just providing the infrastructure; it's up to you to competently configure your ever-changing 12-factor k8s.
That some things are a more complicated doesn't lead to "doesn't solve anything." It solves many things. If they're not things you need solved then fine, but you can't deny that for many workflows it is actually useful.
It's the same user, but with locked down access (by default).
As an example, can't ptrace by default.
A container is a process with some kernel isolation mechanisms setup around it. Unless one of those mechanisms is a user namespace with uids/gids mapped to an unused set of users, you get the same user.
> Most production web sites today are exploitable because of this.
How so? To exploit this, you need to already have RCE on a container. But generally you get that RCE by exploiting the site (the application code) in the first place.
In which scenario does an attacker have code execution privileges in a container, but needs this root privilege to exploit the site?
This is one of the reasons I run podman instead where I can and don't just give users access to the docker service. You can also run buildkit in rootless mode
Wait what? I didn't know that. This sounds terrible. Is the root in the container the same user who runs the container, or the same root who is root on the host machine?