> This hypothesis doesn’t make much sense but I hadn’t realized that the root us...

akerl_ · on May 4, 2020

Without User Namespaces, if you run a container, user IDs in the container use the same pool as the host. So if your container runs as UID 0 / root, it’s the same UID 0 / root as your host.

This is one of the many reasons why giving non-root users access to the Docker daemon (so they can start containers) is dangerous: if, as a non-root user, I can start a container that’s running as UID 0, there’s a lot of possibility for misuse.

User Namespaces enable Docker to use separate UID pools for the containers, which enables a container to run as “UID 0 / root” but have the host actually map that as some arbitrary other UID, so the host can treat it differently than actual UID 0.

dharmab · on May 4, 2020

You can enable user namespace isolation, but it comes with a significant number of tradeoffs.

https://docs.docker.com/engine/security/userns-remap/

Also, Kubernetes doesn't support it.

https://github.com/kubernetes/enhancements/issues/127

jbverschoor · on May 4, 2020

Docker, mongo, MySQL, JavaScript

Insane defaults, but easy to get started without knowledge

sp332 · on May 4, 2020

And not only for running processes, but also for files on the disk. If you mount an image on the host and make a file owned by "root", then mount that image in a container, the container's "root" now owns it too.

Edit: Note that userids are the same, but usernames are not managed by the kernel (handled by /etc/passwd or LDAP or something), so userid 1001 inside the container is userid 1001 outside, but they might have different names if you "ls -l" from different places.

a1369209993 · on May 4, 2020

See also (1)mount's nosuid/nodev options, because plugging in a USB stick with a setuid-root shell on it apparently used to work.

acdha · on May 4, 2020

Or, years before, NFS: the NFS permissions model trusts the client and server implicitly. In many environments, that meant that getting root[1] on any system quickly cascaded across all of them if you could write to, say, someone's shell profile, SSH keys, or a shared binary in a common location — which was not uncommon at all when people were trying to conserve storage costs by only installing things in one place. No suid, nodev, and the various options for preventing uid=0 access were all attempts to bandaid around the lack of a better authentication option until people started switching to Kerberos.

1. Or, if they didn't require trusted ports, any account at all using https://github.com/NetDirect/nfsshell

maccam94 · on May 4, 2020

Docker doesn't use user namespaces by default. They have been working on adding support for a while, but it makes permissions on volumes difficult, and things kept changing in kernel land for years.

Root inside a non-userns container is the same as root on the Linux host, but it is constrained by security policies like seccomp and apparmor.

tannhaeuser · on May 4, 2020

> it makes permissions on volumes difficult

Yeah much easier to run everything as root. Also, we don't need /etc/password and stuff for UID/GID resolution and credentials in the container anyway; we just use ad-hoc auth and crypto from a random third-party lib (that isn't vetted, is never updated, runs in the same address space as your app, and hasn't access to meaningful entropy since running in a container) and supply root credentials on the docker command line or the Dockerfile checked in to github, or both.

That's what we've been saying for years: Docker doesn't solve anything, it merely hides problems from you (and helps your cloud provider's bottom line). Good luck, and yes, PHBs should be worried for civil/criminal gross negligence if the shit hits the fan. Your cloud provider is happy to take your checks, but will shrug-away and point out they're just providing the infrastructure; it's up to you to competently configure your ever-changing 12-factor k8s.

jcranmer · on May 4, 2020

> hasn't access to meaningful entropy since running in a container

getrandom(2)/getentropy(3) should be using kernel randomness generation, which isn't affected by containers I thought.

eythian · on May 4, 2020

That some things are a more complicated doesn't lead to "doesn't solve anything." It solves many things. If they're not things you need solved then fine, but you can't deny that for many workflows it is actually useful.

cpuguy83 · on May 4, 2020

It's the same user, but with locked down access (by default). As an example, can't ptrace by default.

A container is a process with some kernel isolation mechanisms setup around it. Unless one of those mechanisms is a user namespace with uids/gids mapped to an unused set of users, you get the same user.

redis_mlc · on May 5, 2020

> This sounds terrible.

Well it is. Most production web sites today are exploitable because of this.

It's even worse than it sounds.

icebraining · on May 5, 2020

> Most production web sites today are exploitable because of this.

How so? To exploit this, you need to already have RCE on a container. But generally you get that RCE by exploiting the site (the application code) in the first place.

In which scenario does an attacker have code execution privileges in a container, but needs this root privilege to exploit the site?

btashton · on May 4, 2020

This is one of the reasons I run podman instead where I can and don't just give users access to the docker service. You can also run buildkit in rootless mode

cpuguy83 · on May 4, 2020

You can also run Docker in rootless mode.