size_t-to-int vulnerability in Linux’s filesystem layer

mnw21cam · on July 20, 2021

I love the little nugget in the mitigations section. You can plug the hole for a normal filesystem, but then FUSE filesystems have an additional problem: "if an attacker FUSE-mounts a long directory (longer than 8MB), then systemd exhausts its stack, crashes, and therefore crashes the entire operating system (a kernel panic)."

If there's one place other than the kernel where truly defensive programming should be applied, it is systemd.

josefx · on July 20, 2021

What the hell is systemd doing that a 8MB long file path can exhaust its stack? Is it doing some recursive parsing or is it just doing something plain stupid like using a VLA to store user provided data?

megous · on July 20, 2021

Probably unbounded alloca() as always.

megous · on July 20, 2021

Yep: https://github.com/systemd/systemd/commit/b34a4f0e6729de292c...

strdupa(input) without any length check

Fix is to replace it with unbounded malloc() instead of checking for sane length first.

stjohnswarts · on July 20, 2021

Good find thanks for sharing. And everyone at work gripes about me carrying the size around with a lot of my variables in the form of a struct. It's strictly a reminder to always be checking the size since I'm juggling with shotguns.

thayne · on July 21, 2021

The fact that c doesn't have a native concept of an array with length and strings usually use a null byte to determine the end is, IMO c's biggest failing, and it's worst legacy on the wider software world.

nextaccountic · on July 21, 2021

This and having any pointer implicitly be nullable

GoblinSlayer · on July 21, 2021

Do they even C? It's official ideology: a good programmer compensates shortcomings of the language.

stjohnswarts · on July 21, 2021

They seem to be all about the conciseness :) . We have gigabytes of memory, a size parameter isn't going to make a difference haha. The allow me my little idiosyncrasies though so I can't complain.

sinsterizme · on July 20, 2021

Shouldn't this PR also use `free` on the duped string before returning? (I never use C so probably missing something but just based on the docs of strdupa...)

rtldg · on July 20, 2021

The variable p is now declared with "_cleanup_free_" which is using some compiler cleanup/destructor attribute stuff to run free

sinsterizme · on July 21, 2021

ah okay, thank you :)

alerighi · on July 20, 2021

This fix to me reduces the performance for nothing. In Linux (or most general on any UNIX system that I saw) a path should not be longer (total) than PATH_MAX, that is typically defined to 4096 bytes. What is the point on allocating something statically at this point?

And yes, I know that really that is only a limit of the system call path lenght, and in theory you can work with longer paths (by changing the current directory to a path and then opening a file from there), because filesystems does (stupidly in my opinion) support it.

But in reality, how many applications will break? Does it make sense to support them?

Also the code in question seems to be dealing with a filename more than a path. A file name shouldn't be longer than NAME_MAX, and that is an hard limit of many (possibly all?) filesystems, as far as I know. So why?

It would be simpler and more optimized to just truncate the name at PATH_MAX. Avoid the overflow and the crash but give an error. Why hard limits are considered that bad? We waste time supporting edge cases that no one would really use in a real system (no way someone needs a path longer than 4096 bytes...), for what? In Windows the limit is 260 characters, and nobody seems to be bothered by that, only in Windows 10 you can increase that.

wahern · on July 20, 2021

The Linux kernel doesn't have an actual path limit. Nor does Solaris. PATH_MAX is 4096 in glibc and musl libc because setting to it to something like INT_MAX or ULONG_MAX would break a lot of existing code that uses PATH_MAX to size buffers. (Though Solaris does define it as INT_MAX, IIRC.) OTOH, because of the lack of a hard limit there's also code that relies (if accidentally) on paths longer than PATH_MAX.

jwilk · on July 21, 2021

Linux does have a limit, at least for some system calls:

  $ strace -e trace=file perl -e 'open(FH, "<", "/" x 4096)'
  …
  openat(AT_FDCWD, "//////////…"..., O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENAMETOOLONG (File name too long)
  +++ exited with 0 +++

wahern · on July 21, 2021

I stand corrected. It seems that Linux copies the entire path into a kernel-allocated buffer (see getname and getname_flags in fs/namei.c as called by various syscalls in fs/open.c), rejecting paths longer than PATH_MAX.

EDIT: And on Solaris PATH_MAX is 1024 and (AFAICT) Solaris also copies paths into kernel space. It seems I was confusing things with NL_TEXTMAX, which is INT_MAX on glibc (but not Solaris).

yencabulator · on July 25, 2021

There's a limit to what path string you can make the kernel interpret. That does not limit total path length. Keep looping on mkdirat/openat and you can make very very deep trees. As opposed to your syscall that is relative to an arbitrary directory, /proc/self/mountinfo has to contain the whole absolute path to be useful.

yrro · on July 21, 2021

https://eklitzke.org/path-max-is-tricky

The Linux kernel defines upper limits for NAME_MAX (255) and PATH_MAX (4096).

The glibc doesn't enforce this limit because it was originally written to run on GNU HURD which I guess doesn't have these limits.

But systemd only runs on glibc on Linux. So I don't see why it doesn't at least sanity check the length of absolute paths with PATH_MAX...

ohazi · on July 21, 2021

> reduces the performance for nothing

Does the code in question ever run in a tight loop (e.g. on file operations after the filesystem is mounted), or just at mount time?

If it's just at mount time, "reducing performance" by one malloc vs a stack adjustment doesn't seem to me like it should be a primary concern.

alerighi · on July 25, 2021

We are talking about systemd, that is a core software. I would like systemd to do as few memory allocations as possible. The reason is that memory can run out, especially on embedded devices where you have for example 32Mb of RAM, and you have to properly manage the case that you run out of memory. Most programmers don't, and the program does crash in case you don't have memory available. That is bad for PID 1, because that would mean a kernel panic, that you don't want.

If PID 1 does need to do that kind of stuff (as it seems), I would prefer it to fork a process and do the memory allocation in that, so if that process crashes because you are out of memory the kernel doesn't panic.

josefx · on July 21, 2021

> In Linux (or most general on any UNIX system that I saw) a path should not be longer (total) than PATH_MAX, that is typically defined to 4096 bytes.

That almost sounds like the 260 character windows path limit constant used by some ancient APIs. I would assume that any API limited to that path length is dated and probably unreliable in various contexts as the wikipedia article on filesystems explicitly gives the limit as not defined for various Linux filesystems. Also given the recent talk about in kernel support for NTFS (path limit ~2^16) I assume that any historic code still relying on PATH_MAX needs to be fixed.

alerighi · on July 25, 2021

The filesystem can support path length even infinite (simple, make a symlink of a directory inside that directory, you have an infinite path).

PATH_MAX is a limit of a path that you can pass to the various path manipulating functions, open(), unlink(), etc, or returned by getcwd() (that gives an error if path is longer than PATH_MAX, and yes there are non standard system call to go around this limit but... why?)

You can however use paths longer by PATH_MAX, how? Simply chdir() PATH_MAX, then you can chdir() another PATH_MAX, then count how many software breaks...

Imposing a limit on paths makes sense and should be done. 4096 bytes seems reasonable to me. Also, in that example, it wasn't even a matter of a path! They are parsing it seems only the file name, and that is defined to be NAME_MAX, that is 255 bytes, on every system and every filesystem!

merb · on July 20, 2021

was that a guess? wtf... btw. it would probably be hard to make the same mistake in rust. unless you write your own code for strings or use strdupa, too via libc.

I also do never understand why some libraries use "faster" methods everywhere, unless safer ones. it's not like all interfaces to systemd would need to be fast. but they should be secure.

megous · on July 20, 2021

Yes. It happened before, so it was not exactly hard to guess.

https://capsule8.com/blog/exploiting-systemd-journald-part-1...

alerighi · on July 20, 2021

It's not a mistake. Allocating that string on the stack it is not a bad idea. Most of the time the string will be short, and thus an allocation on the stack is faster.

Consider that in Linux a path is defined to be a maximum length of PATH_MAX, that is defined to 4096 bytes, and a filename (and directory name) shouldn't be longer than FILE_MAX that is 255 bytes. This limits are defined in the headers and I use them always in writing my C programs (if it crashes... you are doing something really wrong!).

So how the hell do you have a directory that is more than 8Mb? You shouldn't! The filesystem doesn't support it. It's a matter of the filesystem driver that should reject a path that long in my opinion.

Systemd should be fast. It's at the base of the operating system. Also it should consume little memory. You can say, who cares about allocating dynamically a string, or allocating a static buffer of 16Mb, yes we should care, I use Linux computer with 16Mb of RAM, total. Of course they don't run systemd nowadays since it's too big, but in my opinion systemd is good, and I would like to see it more in the embedded world.

ehvatum · on July 21, 2021

> The filesystem doesn't support it.

Remember that Linux supports hierarchical mounts! You can mount anything at any depth of directory nesting. Even if it were true that MAX_PATH were an FS limitation, you could still nest mounts and encounter absolute paths exceeding MAX_PATH. MAX_PATH is simply the length in bytes of the longest string you should expect system calls to accept as a path parameter.

> I use Linux computer with 16Mb of RAM, total. Of course they don't run systemd nowadays since it's too big, but in my opinion systemd is good, and I would like to see it more in the embedded world.

It sounds like using systemd is a terrible idea for memory-constrained devices, so you really don’t want to see it in the embedded world.

WGH_ · on July 21, 2021

> It sounds like using systemd is a terrible idea for memory-constrained devices, so you really don’t want to see it in the embedded world.

On the other hand, proper event-driven init system (instead of horrible shell scripts with all sorts of fragile "sleep"s and other hacks) sounds sexy for an embedded system. I sometimes get annoyed how home routers, NAS, etc. are slow to boot up

Though the embedded systems I refer to have much more than 16 MB of RAM, more like 128 and up.

GoblinSlayer · on July 21, 2021

Is there a list of init systems that aren't made of shell scripts? Epoch is the only one I found.

zaarn · on July 21, 2021

Fun fact: PATH_MAX and FILE_MAX are glibc/muslc limitations. The Linux Kernel doesn't have a limit here and will happily let you walk into a directory with a 2GB pathname.

ext4 doesn't limit directory depth; only filename length. A filename can be 255 bytes in ext4. How deep that lies in the filesystem isn't limited.

btrfs has the same filename limit, no underlying limit on directory depth.

And I would most likely guess most filesystems don't because the obvious ways to implement directories don't place limits on that depth.

thayne · on July 21, 2021

In rust you can't currently dynamically allocate on the stack, although that's probably something that will be added in the future. And as others have pointed out, allocating on the stack is a fairly reasonable optimization here.

I don't think you could even call strdupa through libc in rust. I would guess that strdupa is either a macro that uses the alloca compiler intrinsic or is itself a compiler intrinsic. Even if it isn't, it will break assumptions the rust compiler makes about the size of the stack frame.

unmole · on July 20, 2021

Wait how does a user space daemon exhausting its stack lead to a kernel panic?

comex · on July 20, 2021

Because the kernel intentionally panics [1] if the init process would otherwise exit in any way – whether because it called exit(), it was killed, or, in this case, it crashed.

This is likely because Unix semantics treat the init process specially: any process whose parent dies is re-parented to the init process. It's not clear what should happen to these processes if the init process itself went away, so the kernel just gives up.

[1] https://elixir.bootlin.com/linux/latest/source/kernel/exit.c...

Denvercoder9 · on July 20, 2021

I wonder if just restarting PID1 could be viable alternative?

jerf · on July 20, 2021

It's generally dangerous to restart PID1 in an enviroment where, by definition, something happened to PID1 that it wasn't expecting. The state of the system is now unreliable, and it's exactly the sort of unreliable in exactly the right place that tends to lead to whopping security issues. Far too easy to end up with "Crash PID1 with 'blah blah blah', then when it restarts it ends up doing bad things X & Y".

cameronh90 · on July 20, 2021

Perhaps a distinction here can be made between running as a server OS and a desktop OS.

In a server I generally want a crash immediately. But on a desktop I'd rather it limp along and give me a chance to finish writing my Hackernews post.

bigiain · on July 21, 2021

I'd rather push the other way on that. We should treat desktops with as much security paranoia at servers.

I'd much rather _not_ have my desktop "limp along" in a poorly understood and probably exploitable fashion while the malware gets a chance to finish encrypting all my files...

If that costs the world the "benefit" of my shared wisdom in a half written Hackernews post, I'm good with that.

lanstin · on July 21, 2021

I run a a lot more untrusted code on my laptop than on my cloud servers. Likewise for work, even more so as I don’t myself trust the spyware/malware they jam on the laptops.

jerf · on July 21, 2021

Basically, there's no solution at this level of granularity. One can also argue that the desktop is where the most important stuff is that we least want hacked, e.g., your family photos, documents, other stuff with a high priority of not being backed up, so we must treat security even higher than a server at this point.

I call these the "already lost" situations. You've already lost, we're just arguing about how to distribute the lossage. While those discussions aren't completely pointless, it is important to keep it clear in our head we're arguing about how to pick up the bodies at a crash site and not how to prevent the crash in the first place; it's a different mindset.

Despite some moderately-justified mockery in the other messages in this thread, the answer really is "just don't crash and have secure code here", which is to say, "don't lose". It's exceeding hard to write and it's a very high bar, but at the same time, it's very difficult to imagine how to secure a single system when you can't even stipulate a core of trusted software exists. If you don't even have a foundation, you're not going to build a secure structure. In this case, by "secure" I don't just mean security, but also, functionality and everything else.

londons_explore · on July 20, 2021

On a server I'd usually want it to limp along too... Better fire an alert but keep happy customers than cause a massive outage just because of someone's overly strict checkfail...

It depends really what your server does, and what the consequences of it doing the wrong thing are.

bigiain · on July 21, 2021

If the consequences of one server wedging itself is a "massive outage" and "unhappy customers", then you probably don't really care about that outage or those customers. If you don't have enough redundancy and alerting and automated disaster recovery to keep your customer facing shit up when one server panics, you're just relying on luck to keep your customers happy.

Fire an alert, remove that server from the load balancer, and fix the problem without your customers even noticing.

Or make sure if you're running a hobby-project architected platform that your customer expectations and SLAs are clear up front, and let it go down until Monday morning when you'll get around to fixing it.

michaelmrose · on July 20, 2021

Or just don't put stuff that crashes in pid1

modshatereality · on July 21, 2021

seriously, if you're doing allocations in pid1 you fucked up.

afc · on July 21, 2021

Yeah, great idea: rather than worry about how to deal with software bugs, just never have bugs...

a1369209993 · on July 21, 2021

Exactly. This is why systemd is a terrible design. The fastest and most secure code, that never crashes and never needs patching, is code that doesn't exist.

michaelmrose · on July 21, 2021

runit.c is 330 lines of code

Keeping it small and simple to minimize bugs is perfectly viable and reasonable.

dirkt · on July 21, 2021

A better alternative would be to keep PID1 as simple as possible, and do anything more complex in a subprocess.

Systemd of course goes in the opposite direction: It assimilates as much functionality as possible from the OS into systemd (though to be fair, not all into PID1).

monocasa · on July 20, 2021

PID 1 is special in Unix systems. As the parent of all other processes (and child to none) it's not clear to the kernel what should happen when it exits.

mjevans · on July 20, 2021

Poorly designed security architecture and division of labor. A more idealized init / systemd would have all of the execution flow of PID 1 mathematically provably correct, and correspondingly have as small a footprint as possible there. All additional functions would run under one or more child processes (where the bulk of systemd would execute).

bigiain · on July 21, 2021

"Lets put the graphics drivers in ring 0, for better performance!" -- Windows NT architects, 1996

"Ummm, lets not do that, it's not such a great idea..." -- Windows Vista team, 2006

fomine3 · on July 21, 2021

Initial Windows NT uses user mode graphic driver, then NT 4.0 move it kernel mode.

https://docs.microsoft.com/en-us/previous-versions//cc750820...

bigiain · on July 21, 2021

Yep. In about '96.

(Well, it came out late '96, so I suppose a bunch of that was actually done in '95.)

Spivak · on July 21, 2021

Different trade offs for different eras and different constraints.

failwhaleshark · on July 21, 2021

Systemd is a swiss-army-kitchen-sink-knife monolith of brittle complexity.

A proper init system similar to runit or s6 would be written in something safer (minimum unsafe) like Rust, be modular, simpler, follow UNIX philosophy, and not try to do everything in one process. Microkernel-style.

paulddraper · on July 20, 2021

Because it's PID 1.

CrazyPyroLinux · on July 20, 2021

systemd? More like systemK! But seriously, are non-systemd systems not vulnerable to the FUSE portion of this? (CVE-2021-33910)

saurik · on July 20, 2021

FWIW, I feel like your comment is responding to an implicit critique of systemd, but even if one was warranted I didn't read that comment as implying such (as the premise would just be that systemd is a key place in the stack where you would need to be super careful, not that it is somehow less careful than other projects... even if I might claim as such for at least logging ;P); it could be that I am misinterpreting your comment, though?

CrazyPyroLinux · on July 21, 2021

Yeah, I think you and a lot of other people misinterpreted my comment, since it was oddly one of my most-downvoted-ever.

My "systemK" joke was indeed implying what you said, that systemd is "a key place in the stack where you would need to be super careful." (Almost Kernel-like.)

And my question was legitimate, although poorly-researched. Answering myself: CVE-2021-33910 only affects systemd, not all FUSE in general.

Faaak · on July 20, 2021

Anyone knows how to try the PoC (https://www.openwall.com/lists/oss-security/2021/07/20/1/1) ?

For me it crashes into the fork_userns:177

PS: don't need to downvote. Sometimes managers want you to prove that there's a need to patch. It's dumb but it's what it is

ploxiln · on July 20, 2021

Your linux distro may already have unprivileged user namespaces disabled. See the "mitigations" section of the post, and check /proc/sys/kernel/unprivileged_userns_clone

Zababa · on July 20, 2021

> Unfortunately, this size_t is also passed to functions whose size argument is an int (a signed 32-bit integer), not a size_t.

Is this the type of things that could be caught by a linter or strict compilation rules? This seems to be to be a failure of the type system.

ainar-g · on July 20, 2021

At this point, you can—and probably should—consider C and “C-with-analysers” two different languages. If you use static (and dynamic, if you have tests) code analysis, making software without these issues is way easier, and doing so without these tools is essentially impossible. Both because of the C language itself as well as because of the culture of “bytes go brrr” and “just use a debugger” that a lot of middle-level C programmers have in my experience.

kllrnohj · on July 20, 2021

Yes, it is the type of thing caught by a linter or strict compilation rules.

https://clang.llvm.org/extra/clang-tidy/checks/cppcoreguidel...

But strict compilation rules (eg, clang's -Weverything) mainly only work if you treat them as errors (so -Werror), and then some of those strict are then also questionable at best, and just outright annoyingly wrong at worst. For example, unused parameter warnings on virtual methods are a waste of time to deal with. It's not a symptom of a bug most of the time, so it being an error just generates workaround churn or you end up just disabling the warning and then maybe that bites you the few times it would have pointed out an actual issue.

Beyond the blanket ones like clang's -Weverything, it can otherwise be a job to keep up with compiler upgrades and the vast number of warning options they have.

heavenlyblue · on July 20, 2021

> For example, unused parameter warnings on virtual methods are a waste of time to deal with.

Why is that even a warning? If at least one of the implementers use a parameter and a warning is shown the warning itself is wrong. That’s just broken implementation of the warning?

gruez · on July 20, 2021

AFAIK most compilers by default will output a warning in this case.

im3w1l · on July 20, 2021

Too bad that most projects are so full of integer size and signedness warnings that people get warning fatigue and just completely ignore them.

usefulcat · on July 20, 2021

GCC's -Wconversion has some issues. For example, good luck getting gcc to /not/ emit a warning for this code, in C or C++. I have yet to find the appropriate cast to avoid a warning. Clang does not warn for this.

    typedef struct {
        unsigned value : 4;
    } S;

    void foo(S* s, unsigned value) {
        // error: conversion from 'unsigned int' to 'unsigned char:4' may change value
        s->value = value;
    }

I mean, I guess I can see the rationale.. it's just annoying to have to resort to using pragmas to turn off -Wconversion whenever I need to assign to a bitfield.

Denvercoder9 · on July 20, 2021

Does it still warn if you do `s->value = value & 0x0F`? That seems like a reasonable alternative to pragmas if it works.

usefulcat · on July 21, 2021

Thanks, that fixed it!

owl57 · on July 21, 2021

It doesn't.

loeg · on July 20, 2021

GCC and Clang (the predominant Linux and Mac compilers) mostly don’t warn by default, you need -Wall or other flags (-Wextra, -Weverything, or specific flags like -Wconversion).

jeffbee · on July 20, 2021

I don't see any warnings for this narrowing parameter conversion.

  #include "stddef.h"

  short foo(short a) { return a % 42; }
  
  size_t bar(void) {
      size_t sz = ~0UL;
      return foo(sz);
  }

https://godbolt.org/z/3ec9v8Pa4

gruez · on July 20, 2021

On MSVC, default project settings I get:

    warning C4267: 'argument': conversion from 'size_t' to 'short', possible loss of data

https://godbolt.org/z/nYeWT7zv6 (/W3 is the default warning level when creating a new project)

I saw these warnings so often that I assumed that every compiler had them.

xeeeeeeeeeeenu · on July 20, 2021

It warns if you add -Wconversion. Unfortunately, that flag generates lots of false positives (at least in gcc), so using it isn't always a good idea.

morio · on July 20, 2021

What kind of false positives are you seeing with gcc?

Personally I have never seen gcc spitting out a false positive. IMO it's always a good idea to explicitly downcast even if you know that it's 'safe'. That way someone else will see instantly what's going on. The fact that Rust requires it should tell us something.

xeeeeeeeeeeenu · on July 20, 2021

For example, I've just reported this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101537

You can find more cases in the bugtracker. To be fair, it seems many of them were fixed in recent releases.

Cloudef · on July 21, 2021

The 1 and 0 are ints. It most likely complaints because the sign isn't same.

jeffbee · on July 20, 2021

It isn't ever a good idea. C programmers are just supposed to know about the conversion and promotion rules, every time they add a line, even though the rules are actually insanely complicated. Compiler warnings can't overcome this because programmers have no way of just discovering one of GCC's 726 warning flags, only a few of which are enabled by -Wall and -Wextra, most of which are way too noisy to serve a useful purpose.

ototot · on July 20, 2021

In my own small projects, I always add -Wconversion to build configuration. I think the false positive is affordable when you start from small piece of code.

saurik · on July 20, 2021

Yeah; AFAIK you need something like clang-tidy's cppcoreguidelines-narrowing-conversions check (which everyone should definitely be using). (edit: But I'm wrong! That check is apparently similar to the -Wconversion mentioned by someone else.)

nly · on July 20, 2021

They can't warn by default, because on some platforms long->int isnt a narrowing conversion.

aaronmdjones · on July 20, 2021

The compiler knows what platform it's compiling the code for, though.

gaul · on July 21, 2021

clang's -Wshorten-64-to-32 can catch this:

https://clang.llvm.org/docs/DiagnosticsReference.html#wshort...

The more general-purpose -Wconversion has many false positives, often around int to char conversion. Some functions like int toupper(int) have an unexpected int return type to deal with special out-of-bound values like EOF.

codedokode · on July 20, 2021

Yes, this can be caught by a static analyzer and it is sad that Linux doesn't use it. I wonder, is it because the code quality is low and there would be too many warnings?

SolarNet · on July 21, 2021

They do use them, but only in a narrow fashion. A lot of people have tried to make default kernel wide static analyzers but they are mostly not useful.

And it's because the kernel does a lot of non-standard things, mostly because it has to. It is not a normal program.

cesarb · on July 20, 2021

This kind of issue is the reason why some more modern languages like Rust or Go do not have implicit narrowing conversions. For instance, on Rust, trying to simply pass an usize (Rust's equivalent of size_t) to a function which expects an i32 (Rust's equivalent of int) will not compile; the programmer has to write "size as i32" (Rust's equivalent of "(int) size"), which makes it explicit that it might truncate the value at that point.

(Some Rust developers argue that even "size as i32" should be avoided, and "size.try_into()" should be used instead, since it forces the programmer to treat an overflow explicitly at runtime, instead of silently wrapping.)

derefr · on July 20, 2021

> Some Rust developers argue that even "size as i32" should be avoided, and "size.try_into()" should be used instead, since it forces the programmer to treat an overflow explicitly at runtime, instead of silently wrapping.

It's important to still have the option for efficient truncating semantics, though; some software (e.g. emulators) needs to chunk large integers into 2/4/8 smaller ones, and rotation + truncating assignment is usually the cheapest way to do that.

But, importantly, this is a rare case. Most software that does demoting casts does not mean to achieve these semantics.

So I wonder — are there any low-level/systems languages where a demoting cast with a generated runtime check gets the simple/clean syntax-sugared semantics (to encourage/favor its use), while truncating demotion requires a clumsier syntax (to discourage its use)?

tialaramex · on July 20, 2021

Right, this is a small infelicity in Rust, it is easier to write

  let x = size as i32;

... even if what you meant was closer to

  let x: i32 = size.try_into().expect("We are 100% sure size is small enough to fit into x");

But at least it isn't C or C++ where you might accidentally write

  x = size;

... and the compiler doesn't even warn you that size is bigger than x and you need to think about what you intended.

It's really hard to fix this in C++. Some of the Epoch proponents want to do so using epochs to get there, basically you'd have a "new" epoch of C++ in which narrowing must be explicit, and old code would continue to have implicit narrowing so it doesn't break.

dkirill · on July 20, 2021

> and the compiler doesn't even warn you that size is bigger than x

That's not true tho, compiler with reasonable flags set will definitely warn you and if you really don't like this kind of code you can force compiler to issue an error instead

simias · on July 20, 2021

What flags you have in mind? Because this code doesn't generate any warning with GCC 11.1.0 with -Wall -Wextra:

    int main(void) {
        int some_int = 1234567;
        char c = some_int;

        return c;
    }

kllrnohj · on July 20, 2021

Clang-tidy has a check for it: https://clang.llvm.org/extra/clang-tidy/checks/cppcoreguidel...

aaronmdjones · on July 20, 2021

-Wall -Wextra -Wpedantic does not enable all diagnostics.

This is GNU's idea of "all".

Contrast to Clang's -Weverything, which will.

lazide · on July 20, 2021

It seems though that the point is made, right? Even 'good' approaches miss on what should be a clear 'whoa, are you sure?' type warning. There are a lot of footguns wandering around in C/C++ land.

BoorishBears · on July 20, 2021

No, the point was you want don't get a warning and it will silently wrap. You can scroll up if you've forgotten.

And it is false. My default configuration C++ project created in Clion shows it very clearly, and even pesters to use int32/int64 over int/long.

But as usual the default fallback when you're wrong about C++ is "uh yeah but lotta footguns amirite"

As if there aren't enough that we need to start making them up...

opheliate · on July 21, 2021

And yet RedHat's recommended compiler flags for GCC [0], for example, do not appear to catch the wrapping assignment in the above example code.

0: https://developers.redhat.com/blog/2018/03/21/compiler-and-l...

BoorishBears · on July 21, 2021

Ah yes, of course the goalpost was "you need to customize your settings to catch it" above.

Now that the default in the most beginner friendly of IDEs catches it, the goalpost is "my pet source of customization designed with C++98 in mind doesn't catch this"

Of course, even your pet source of customization caught up: https://developers.redhat.com/blog/2021/04/06/get-started-wi...

opheliate · on July 21, 2021

If by "caught up", you mean talked about clang-tidy in a separate post, which is definitely not a GCC compiler flag, then sure.

The goalpost, since you're insistent on being explicit about it, was whether a C/C++ compiler "with reasonable flags" will catch the implicit wrap. GCC is a very popular compiler, and to be honest, I'm still not sure how to get it to warn on the above code, if doing so is possible.

Edit: Just read the rest of the thread, it's -Wconversion, which I suppose makes sense. Ignore me, point taken.

lazide · on July 21, 2021

If said beginning friendly IDE is used by only a couple percent of the ecosystem, it seems disingenuous to use it as proof this isn't a problem in this context?

BoorishBears · on July 21, 2021

Ok so we're going to keep shifting the goalposts, now it's "there aren't enough beginners relative to total usage so beginner friendly IDE isn't enough"...

I mean MSVS uses Clang-tidy too, Clang-tidy integrates style guides provided by Mozilla and Google.

Most C++ Google projects have clang-tidy configs.

Clang-tidy is literally table-stakes for modern C++ tooling.

Github shows 970,000 commits related to setting up clang-tidy

But uh, yeah, let's see where the goalpost skitters to next.

-

The irony is I said above, C++ has enough footguns without sticking your fingers in your ears and ignoring boring, easy to setup, widely well known and well used tooling.

But in the war against C++ no stone must be left unturned.

C++ is a tiny fraction of all the code I've written in my life but it irks me to no end that people can't deal with the idea that language safety can improve, that tooling can be considered part of that safety. Or rather they can... unless they're talking about C/C++

lazide · on July 22, 2021

I’m definitely not moving any goalposts I know of!

I thought the point I had been making, as had others, is that by default this is an easy footgun.

There are all sorts of things that can be added on to all languages to help - if you know it’s a problem worth solving, etc. which is inevitably after you’ve footgunned yourself with it bad enough you felt the need to research how to prevent it.

Other languages just do the safer thing (or most compilers By default warn at least about common footguns) more - which is the whole point of this thread?

BoorishBears · on July 22, 2021

There was one point, C++ won't warn you by default .

But tooling that is incredibly common, that beginners will run into even if they take the path of least resistance, and experts will use because it enforces standards at the very least, covers it.

Like Js without linters is a minefield, but everyone accepts you should lint your Js. Why does that change when C++ is involved?

rocqua · on July 20, 2021

The reason for this decision is so that compiler upgrades with -Wall and -Werror don't break builds.

I can see the reason behind it, but I feel that this behavior is something you opt into when you use -Werror.

derefr · on July 20, 2021

> The reason for this decision is so that compiler upgrades with -Wall and -Werror don't break builds.

It feels like the "right thing" here would instead be for the compiler to allow build scripts to reference a specific point-in-time semantics for -Wall.

For example, `-Wall=9.3.0` could be used to mean "all the error checks that GCC v9.3.0 knew how to run".

Or better yet (for portability), a date, e.g. `-Wall=20210720` to mean "all the error checks built into the compiler as of builds up-to-and-including [date]."

To implement this, compilers would just need to know what version/date each of their error checks was first introduced. Errors newer than the user's specifier, could then be filtered out of -Wall, before -Wall is applied.

With such a flag, you could "lock" your CI buildscript to a specific snapshot of warnings, just like you "lock" dependencies to a specific set of resolved versions.

And just like dependency locking, if you have some time on your hands one day, you could "unlock" the error-check-suite snapshot, resolve all the new error-checks introduced, and then re-lock to the new error-check-suite timestamp.

HanyouHottie · on July 20, 2021

MSVC has this[1]. .NET code analyzers are also versioned[2].

[1]:https://docs.microsoft.com/en-us/cpp/error-messages/compiler... [2]:https://docs.microsoft.com/en-us/dotnet/fundamentals/code-an...

antoinealb · on July 20, 2021

I think it might be more of an headache: what if somebody fixes a bug in an analyzer so that it catches things it used to miss ? Should it be a breaking change ?

Personally i would vote for "Wall with Werror" means no guarantee for your build.

wheybags · on July 20, 2021

The real solution: leave Werror off by default, activate it only during CI builds

derefr · on July 21, 2021

That's even worse, because then an upgrade to the compiler in the managed CI runner (e.g. Github Actions') base-image will translate to the same version of the code failing where it previously succeeded, with nobody sure why.

At least with -Werror on at all times, devs will tend to upgrade before the very-stable CI environment does, and thereby catch the problem at development time (usually less time-pressure) rather than release-cutting time (usually more time-pressure, esp. if the release is a hotfix.)

-----

Mind you, it does work to enable -Werror only in CI, if you lock your CI environment / compiler Docker image / etc. to a specific stable version, and treat that as the thing to re-lock in place of the "error-check suite snapshot version."

This has the disadvantage, though, that you can't take advantage of newly-stable/newly-unstable language features, or of newly-introduced compiler optimizations, without biting the bullet and taking on the work of fixing the errors introduced by re-locking the base-image.

With a separate flag for locking down the error-check-suite snapshot version, you could continue to upgrade the compiler — and thereby get access to new features / optimizations — while staying on a particular build regression "scope."

account42 · on July 22, 2021

> That's even worse, because then an upgrade to the compiler in the managed CI runner (e.g. Github Actions') base-image will translate to the same version of the code failing where it previously succeeded

If you don't want your build to fail on warnings, don't use -Werror. If you want it to only fail on specific warnings, use -Werror=...

> with nobody sure why

Unless they look at the errors in the compiler output. What does it matter if it was brought on by a compiler update or a push?

> At least with -Werror on at all times, devs will tend to upgrade before the very-stable CI environment does

Nothing wrong with -Werror for devs - the problem is when you ship code to others and leave -Werror on by default.

GoblinSlayer · on July 21, 2021

Is -Werror really supposed to not break builds?

account42 · on July 22, 2021

The whole point of -Werror is to break builds and -Wall / -Wextra are definitely not frozen. If you can't handle compiler updates resulting in errors, don't use -Werror in that environment.

130e13a · on July 20, 2021

keep in mind though that -Weverything is not intended to be used in production: https://quuxplusone.github.io/blog/2018/12/06/dont-use-wever...

account42 · on July 22, 2021

-Weverything is great for CI though, in compination with lots of -Wno-... flags to disable warnings you don't want. Instead of having to manually look out for new warning flags you will get all automatically.

aaronmdjones · on July 25, 2021

Yep, this is what I do; throw in -Weverything followed by a few things like -Wno-packed -Wno-padded -Wno-unused-parameter.

gumby · on July 20, 2021

> This is GNU's idea of "all".

Unfortunately, over the years people baked the semantics of -Wall into their builds so new diagnostics could not be added to that flag.

And clang’s -Weverything shows how the opposite can fail as well

not2b · on July 20, 2021

There are some very wrong-headed warning options in gcc, such that turning them on and avoiding getting them will make your code worse. So -Wall means 'all recommended warnings'.

Also there are some warnings that won't be produced if you compile without optimization, because the needed analysis isn't performed.

account42 · on July 22, 2021

And yet we have things like -Wmaybe-uninitialized in -Wall which by definition will occasionally warn on perfectly good code.

ridiculous_fish · on July 20, 2021

-Wconversion will do it.

stjohnswarts · on July 20, 2021

-Wconversion will catch this

rualca · on July 20, 2021

>What flags you have in mind?

-Wconversion

https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html

This is common knowledge for ages. Any cursory Google search returns countless answers.

Take this post made over a decade ago.

https://stackoverflow.com/questions/1730255/gcc-shouldnt-a-w...

tialaramex · on July 20, 2021

Fair point although it seems "reasonable" varies from one platform to another, it doesn't warn out of the box for me but people have reported MSVC gets warnings here.

nly · on July 20, 2021

Narrowing within { } initialization is forbidden C++ now

jacoblambda · on July 20, 2021

It can impact compile time performance but Boost Safe Numerics provides some nice wrappers to prevent narrowing (or restrict it to specific classes of narrowing) and throw warnings or errors at compile time similar to what you see in Rust.

glaze · on July 20, 2021

If you initialize it like this you get a warning:

x = { size };

frogblast · on July 20, 2021

In swift:

    //  a is a UInt64
    let a = Int.random(in: 0..<Int.max)
    
    //  causes a fatal runtime error if out of range, halts execution
    let b = UInt32(a)
    
    //  returns a UInt32?, which will be nil if out of range
    let c =  UInt32(exactly: a)
    
    //  another approach for exact conversion:
    guard let d = UInt32(exactly: a) else {
        //  conversion failed
        //  handle error and return
        return
    }
    // 'd' is a UInt32 without any bits lost
    
    //  always succeeds, will return a UInt32, either clamped or truncated.   Truncation just cuts off the high bits.
    let e = UInt32(clamping: a)
    let f = UInt32(truncatingIfNeeded: a)

dathinab · on July 20, 2021

As a interesting side note:

The "as" operator is often considered to have been a mistake. Both because of unchecked casts and because of "doing to much".

So I wouldn't be surprised if in the (very) long term there will be a rust edition deprecating `as` casts (after we have alternatives to all cast done with `as`, which are: Pointer casts, dyn casts/explicit coercion and truncating integer casts, for some we already have alternatives for on stable for other not).

And for all who want to not have `as` today you can combine extension traits (which internally still use `as`) + clippy lint against any usage of `as`.

EDIT: I forgot widening integer casts in the list above ;-).

slashdev · on July 20, 2021

I would prefer not to see that happen, I'm fine with as and the safer options as they are currently. It would be a big job to update all the code in the wild when you want to move to the newer edition.

tialaramex · on July 20, 2021

I don't see it as a big job. Am I wrong?

Imagine there's a suitable narrow::<type>() function introduced which has the same consequence, always narrowing, if your data was too wide it may drop important stuff on the floor, and narrow() just says that's too bad.

Rust 2030 can introduce narrow::<type>(), warn for narrowing as usage and then Rust 2035 can error for as. The Rust 2030 -> 2035 conversion software can consume code that does { x as y } and write { x.narrow::<y>() } instead. This code is not better but it's still working in Rust 2035 and this explicit narrow() function is less tempting for new programmers than as IMO.

scns · on July 20, 2021

Yes, BUT the correct thing to do would be more tedious to write that way.

wizzwizz4 · on July 20, 2021

You don't need to do that. Rust editions are fully backwards-compatible, since they can depend on code from different editions.

slashdev · on July 20, 2021

I'm aware of that, which is why I specified when you want to move to the newer edition.

cdirkx · on July 20, 2021

Isn't this something that could be done automatically by rustfix?

chippiewill · on July 20, 2021

> are there any low-level/systems languages where a demoting cast with a generated runtime check gets the simple/clean syntax-sugared semantics (to encourage/favor its use), while truncating demotion requires a clumsier syntax (to discourage its use)?

Not exactly the same thing, but in a related area C++ does this a bit.

In C++ you can always still do a c-style cast `(int) some_var` (and the implicit casts obviously), but in general you're meant to use the C++ style explicit casts like `static_cast` and `const_cast`. These are generally tidy, but the most powerful and dangerous of these casts is deliberately awkwardly named as `reinterpret_cast<int>(some_var)` rather than something terse.

It's always easy to spot during a code review.

gumby · on July 20, 2021

You can have your compiler warm about c-style casts

titzer · on July 20, 2021

In Virgil, casts are written "type.!(expr)". Casts between numbers check ranges and roundability (for float<->int conversion). Reinterpreting the bits is written "type.view(expr)" and ignores signs, looks at the raw float bits, etc.

edit: a cast will throw an exception if it fails, in case that was not clear from context.

londons_explore · on July 20, 2021

It would be nice if CPU's had an instruction for "read the low 8 bits of this register, and require the high bits all be zeros (otherwise throw an exception)".

Then safety is free...

derefr · on July 20, 2021

What does "otherwise throw an exception" mean at the CPU level?

I know Erlang's BEAM VM has a "fail jump pointer register", where instructions that can fail have relative-jump offsets encoded as immediates for those instructions, and if the instruction "fails" in whatever semantic sense, it takes the jump.

But most CPUs don't have anything like that.

Would you want it to trap, like with integer division by zero?

CPU traps are pretty hard to handle in most language runtimes, such that most compilers generate runtime checks to work around them, rather than attempting to handle them.

simias · on July 20, 2021

I assume they meant throwing an exception like divisions by zero usually do, i.e. a hardware trap.

I always thought that overflows should be checked in hardware, I suppose it's not a stretch to extend that to truncation. It's controversial though, and obviously mostly a thought experiment anyway unless we manage to convince some CPU manufacturer to extend their ISA that way.

MIPS does have (optional) trapping overflow on signed add/sub overflow, so at least there's a small precedent for it.

ridiculous_fish · on July 20, 2021

Swift checks all arithmetic by default: https://swift.godbolt.org/z/rW614G5aq

It seems obvious that future Apple CPUs will have hardware support for this, if they don't already.

saagarjha · on July 20, 2021

I don’t see this happening unless it makes it into the ARM ISA.

gumby · on July 20, 2021

I don’t know the terms of Apple’s license with ARM, do you? I’m quite interested.

Given that Apple was one of the original founders of ARM it’s quite possible that their license allows much more latitude anyone else’s.

saagarjha · on July 20, 2021

Adding new instructions to userspace programs is almost certainly not going to fly. All of their extensions have been hidden behind an opaque API, or limited to use in the kernel.

masklinn · on July 21, 2021

That… makes no architectural difference at all. As far as the architecture is concerned, these are architectural extensions either way: userspace programs can observably contain and execute instructions which are not standard ARM.

If AMX is allowed under their license, there is no reason why checked extensions would not be.

ridiculous_fish · on July 21, 2021

Apple benefits from compatibility with the ARM ecosystem, but there’s no downsides (from their perspective) from extending it. Their chips are “Apple Silicon”, setting the stage for forging ahead alone. I think it’s a card they hold in reserve, to be played when the time is right, like the Intel transition.

hermitdev · on July 20, 2021

I'm not familiar with ARM ISA, but from the godbolt disassembly, it doesn't look like anything special going on here - just the ASM being generated. What's happening here is it just does the add, jumps on overflow flag set to an invalid opcode...

saagarjha · on July 20, 2021

The suggestion was a custom instruction or architectural extension to have this happen in hardware, rather than needing to write out extra code for this.

WalterBright · on July 20, 2021

D doesn't allow implicit narrowing conversions. The user has to have an explicit cast, like `cast(int) size`. Cast is made into a keyword so all those explicit conversions can be found with a simple grep.

We consider it best practice to try and organize the types such that explicit casts are minimized.

titzer · on July 20, 2021

This is a deep hole for language design.

I thought about this very, very carefully when designing Virgil[1]'s numerical tower, which has both fixed-size signed and unsigned integers, as well as floating point. Like other new language designs, Virgil doesn't have any implicit narrowing conversions (even between float and int). Also, any conversions between numbers include range/representability checks that will throw if out-of-range or rounding occurs. If you want to reinterpret the bits, then there's an operator to view the bits. But conversions that have to do with "numbers" then all make sense in that numbers then exist on a single number line and have different representations in different types. Conversion between always preserve numbers and where they lie on the number line, whereas "view" is a bit-level operation, which generally compiles to a no-op. Unfortunately, the implications of this for floating point is that -0 is not actually an integer, so you can't cast it to an int. You must round it. But that's fine, because you always want to round floats to int, never cast them.

[1] https://github.com/titzer/virgil

cjensen · on July 20, 2021

C/C++ compilers commonly have warnings for narrowing conversions, and separate warnings for mixing signed/unsigned conversions for same-sized values.

While some folks aren't too fussed about warnings like this, those folks generally aren't writing secure code like kernels. I'm very surprised that kind of conversion was permitted in the code.

ronsor · on July 20, 2021

Shout out to Zig, which requires explicit casting also.

baby · on July 20, 2021

I wish clippy had a lint against downcasts specifically. I aso like that rust has no “int” type

jamincan · on July 21, 2021

It does, you just need to enable it. There are a bunch of other cast-related lints as well that are allow by default.

https://rust-lang.github.io/rust-clippy/v0.0.212/#cast_possi...

baby · on July 21, 2021

Oh wow, didn’t know about that one. Is this new?

pjmlp · on July 20, 2021

Languages of the same age or older than C, also have explicit narrowing, but apparently that was seen as programming with a straightjacket.

mgraczyk · on July 20, 2021

That's cool! And some languages like Go don't even allow implicit widening conversions: https://play.golang.org/p/a5C5jsHypmu

est31 · on July 20, 2021

Rust doesn't allow them either: https://play.rust-lang.org/?version=stable&mode=debug&editio...

Adding .into() works though, which is the recommended method if the conversion can be statically guaranteed (otherwise try_into should be used, which will become easier in the 2021 edition as the TryInto trait will become part of the prelude).

ridiculous_fish · on July 20, 2021

into() does not work from size. It's rather frustrating in practice. https://stackoverflow.com/questions/62832438/why-is-rusts-us...

slashdev · on July 20, 2021

Yeah, that is frustrating since there are no platforms where that would fail today, and it's hard to imagine why we would ever want one with 256bit pointers.

We don't even use full 64bit pointers today on x64.

fulafel · on July 20, 2021

Maybe a fat pointer llvm target ala SoftBound running on a machine with 128 bit bare pointers, like as/400.

est31 · on July 20, 2021

When 16 bit computers went to 32 bit, people probably thought that one wouldn't ever need 64 bit computers either. That being said, by the time 128 bit run out we have probably boiled earths oceans :).

You could think of checked memory models where half of the 256 bit address is a 128 bit random key needed to access some allocation, or maybe even a decryption key. Similar things are done with the extra space of x64 as well.

Also, 128 bit numbers are still quite uncommon in Rust. Easy conversion of usize to them wouldn't be that useful if conversions of the other number types don't work.

a1369209993 · on July 21, 2021

> When 16 bit computers went to 32 bit, people probably thought that one wouldn't ever need 64 bit computers either.

People have said that about pretty much every memory size in the history of computing. The argument for 64-bit is not "2^64 bytes ought to be enough for anyone"; it's "you couldn't use more than 2^64 bytes even if you wanted to". Writing a full register worth of data every clock cycle at 4GHz works out to 32GB/s. (2^64B / 32GB/s) is just over seventeen years, to fill up a 64-bit address space, assuming you're doing no actual computation. Few computers even work for seventeen years without replacement, much less run single processes continually with no reboots.

mgraczyk · on July 20, 2021

The Ethereum virtual machine addresses it's storage with 256 bits, so there's one wild example. Although in this case you'd probably not want to use usize directly to represent storage.

slashdev · on July 20, 2021

I'm not familiar with the Ethereum virtual machine, are these really memory pointers? Surely it's represented differently?

mgraczyk · on July 20, 2021

EVM is a stack machine, and it has durable storage which is addressed using 256 bit "pointers". So you can do something like

    push x // 256 bit constant
    LOAD
    // top of stack now contains storage[x]

pornel · on July 20, 2021

Even worse, there are platforms with >=128-bit pointers but 64-bit address space. Rust has chosen usize to be uintptr_t rather than size_t, even though it mostly uses it as if it was size_t. A ton of code is going to subtly break when these two sizes ever differ. Rust is likely already doomed on >64-bit platforms, and is going to be forced to invent its own version of a LLP64 workaround.

TheCycoONE · on July 21, 2021

Sorry, I'm not sure how this is a problem. On segmented architectures size_t is smaller than uintptr_t, but that just means there needs to be an allocation limit < usize::max_value.

It would have cause more bugs if they defined it the other way and people used usize to store addresses.

dezgeg · on July 21, 2021

"Fortunately" there will be bigger problems than Rust on 128-bit machines - lots of UAPI structures in the Linux kernel have pointer size hardcoded to 64 bits.

masklinn · on July 20, 2021

> It's rather frustrating in practice.

All explicit conversions, more so fallible, are "frustrating in practice" especially when coming from a language without those foibles.

But given the semantics of usize/isize, it is perfectly reasonable, nay, a good thing, that they're considered neither widenings nor narrowings of other numeric types.

ridiculous_fish · on July 20, 2021

usize should be Into<u64> iff usize is <= 64 bit.

usize is already on that hook: `0x100000000usize` will fail to compile in 32 bit, so you already risk compile errors when switching archs.

As it is I'm just writing `as u64` which is clearly worse.

mnw21cam · on July 20, 2021

Add Java to that list.

csharptwdec19 · on July 20, 2021

Same for C#. Any narrowing truncation needs to be an explicit cast. Widening is typically allowed implicitly, although in the case of the 'decimal' (128 bit struct representing a 'higher precision' floating point) type you still need an explicit cast from a 'double', since there are cases where that conversion can still change the value or fail (i.e. Infinity/NaN)

pjmlp · on July 20, 2021

Microsoft is considering turning on by default checked arithmetic in the 2022 Visual Studio templates, by the way.

irogers · on July 20, 2021

It can catch people by surprise that Java's narrowing conversions may not preserve the sign. For example, the following is broken:

class TimestampedObject implements Comparable<TimestampedObject> { long timestamp; int compareTo(TimestampedObject other) { return (int)(timestamp - other.timestamp); } ... }

ErrorProne catches this: https://errorprone.info/bugpattern/BadComparable

paulddraper · on July 20, 2021

Noo! This idea is super new and was invented by Go/Rust.

mytailorisrich · on July 20, 2021

The compiler can issue warnings for this.

This os why in C it is a good practice to enable all compiler warnings and to have the compiler treat warnings as errors.

viraptor · on July 20, 2021

In theory but not in practice if you're distributing your apps sources.

If you write for C compiler Foo 8, there's a decent chance Foo 9 will raise a warning which didn't exist before. Now you have to handle "why doesn't this compile" issues and distributions have to patch your sources to do future releases. And that's ignoring bugs like GCC in the past where in some versions you could not satisfy specific warnings.

j1elo · on July 20, 2021

The golden rule is: use -Werror for Debug builds, so you catch and fix all warnings during development. That's fair enough.

But never ever leave -Werror enabled for building in Release mode. You'll be preventing your code from building as soon as a new compiler version goes out. Maintainers or code archeologist will have a much worse time than if this option was simply disabled to start with.

nly · on July 20, 2021

Long and int are the same size on Windows. The fact that these sizes aren't well defined is the cause of the issue.

phkahler · on July 20, 2021

This wouldn't be a problem if "int" was defined as the same size as size_t. The solution is probably to change all those functions to take a parameter of size_t instead of int.

IMHO one should always be using C99 types instead of int, but Linux predates that.

Also, shouldn't that implicit conversion cause a compiler warning?

pcwalton · on July 20, 2021

> This wouldn't be a problem if "int" was defined as the same size as size_t.

ILP64 causes a lot of problems, most notably needlessly-increased memory usage and, in C, the inconvenience of requesting a 32-bit type when int is 64-bit. It's rather uncommon to actually need the extra 64-bit range except when describing pointer addresses and memory/disk sizes, both of which benefit from an explicit intptr_t/size_t type for readability if nothing else.

saagarjha · on July 20, 2021

ILP64 also solves a lot of problems if you don’t define overflow to be UB.

tialaramex · on July 20, 2021

Linux apparently provides a similarly named set of sized integer types to Rust, ie: s8 u8 s16 u16 s32 u32 s64 u64

But of course getting C programmers to use these integer types rather than the ones they grew up with isn't easy.

foobiekr · on July 20, 2021

I did a lot of C - a lot - in the mid-90s through the mid/late 2000s and have never seen any sizable C code base where explicit sizes were not the norm throughout - u_int32_t etc. I think this is not the problem, it's the implicit conversions.

masklinn · on July 21, 2021

I don't think that stops C's implicit conversions either, does it?

dezgeg · on July 21, 2021

It doesn't. They are in the end just typedefs to the basic types like int, long, etc.

chippiewill · on July 20, 2021

> IMHO one should always be using C99 types instead of int, but Linux predates that.

"Always" is a strong way of putting it, there are often times where it makes sense to use the platform's "natural" word sizes (which is the entire point of having `int` `long` `long long` etc.)

phkahler · on July 20, 2021

>> there are often times where it makes sense to use the platform's "natural" word sizes

In those cases we probably don't care about the full range of the larger types, so it doesn't hurt to use the smallest type for a range of expected values. If it does make a difference, the program will behave differently when compiled on a different arch or even a different compiler.

But maybe "generally" instead of "always". OTOH even I am guilty of using an int to loop over an array.

cesarb · on July 20, 2021

> This wouldn't be a problem if "int" was defined as the same size as size_t.

That would lead to a hole in the type sequence (char <= short <= int <= long <= long long) for 64-bit targets (where int is the 32-bit type while size_t is 64 bits).

> IMHO one should always be using C99 types instead of int, but Linux predates that.

On the other hand, on Linux "long" has always been defined as the same size as size_t, so using "long" instead of "int" everywhere could also be an option.

kenniskrag · on July 20, 2021

so you mean int_fastX_t or int_leastX_t?

marcosdumay · on July 20, 2021

I think Rust has no language construction for that, but the best implementation of "size as i32" should fail on overflow.

In Haskell I would use an exception, and mark the function as unsafe, but the stdlib seems to disagree with me here.

jeffbee · on July 20, 2021

A fun fact is there are 196 instances of `int buflen` in Torvalds' tree today, 95 instances of `int namelen`, and 13 instances of `int pathlen`, also 3000+ `int size`.

kvathupo · on July 20, 2021

Cached mirror - https://webcache.googleusercontent.com/search?q=cache:LwH96X...

I'm clueless about security: where does this fall on the scale of non-issue to critical? It strikes me as tending towards the latter, given that it enables unprivileged users to become root. Any insight into past Linux Kernel vulnerabilities that were severe?

vbezhenar · on July 20, 2021

Local root escalation is very common vulnerability. It's almost pointless to expect that attacker will not be able to escalate his privileges given enough time, if he got user account access. One weak layer of defense at most.

McGlockenshire · on July 20, 2021

Pulling this attack off requires enough access to the machine to either run the unprivileged commands needed to create the exploit condition or to upload a binary/script that runs unprivileged that in turn creates the exploit condition.

If the attacker already has that level of unauthorized access, you're already doomed.

tptacek · on July 20, 2021

Attacks like this break multitenant computing environments. It's not a threat to your desktop computer or your phone. But it can be a very big deal for hosting environments.

It also breaks sandboxing. To whatever extent you're trying to run programs that are somehow jailed, so you can download and run them without worrying about them taking over your system, kernel LPEs break those assurances.

samus · on July 21, 2021

Android also uses user accounts (one per app) to enforce its security model. Is that what you refer to with sandboxing?

Deukhoofd · on July 20, 2021

I see in the mail that Red Hat sent out patches to resolve this. Are those patches already merged, or is this a CVE about a live exploit?

GrayShade · on July 20, 2021

Still not fixed in the mainline kernel, it seems.

megous · on July 20, 2021

It's fixed in 5.13.4.

wolf550e · on July 20, 2021

This is the fix, it seems:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux...

ddoubleU · on July 20, 2021

Afaik this should work on Android for some time now too right?

Got unrootable old (but still fully working) phone there, might try to play with it.

spijdar · on July 20, 2021

... do Android kernels normally build with and allow non-privileged users to make namespaces? I'd be really surprised.