Segfaults are our friends and teachers

kibwen · on April 26, 2016

To be absolutely clear, the behavior exhibited in the OP is indeed considered a bug by the Rust developers: see https://github.com/rust-lang/rust/issues/16012#issuecomment-... for the latest discussion. TL;DR: this currently isn't exploitable on Windows, and patches to LLVM adding support for stack probes will ideally allow this to be easily solved for non-Windows platforms as well.

Annatar · on April 26, 2016

From the article:

> The Rust program got a segmentation fault because it attempted to write to inaccessible memory, but only through the stack pointer. None of the undefined behaviors disallow this, which I think is why it’s ok for this Rust program to segfault.

What I got out of that is that Rust does not work as advertised if there are still situations where a program could segfault. The entire premise of Rust, as I understood it at least, is that it does things in a safe manner and the programmer does not have to worry about it. Now I learned that there are undefined behaviors. In my view, for a language that bills itself as safe, there should not exist such things as undefined behaviors. As far as I am concerned, then, based on the advertising of Rust, this is false advertising.

steveklabnik · on April 26, 2016

It's not false advertising; it's a bug. Rust is not perfect. It's also not undefined behavior. https://github.com/rust-lang/rust/issues/16012

(Basically, stack probes are only implemented on Windows, and we need them on the other platforms, but it hasn't been implemented yet, for various reasons.)

kalmar · on April 26, 2016

Now I want to learn about stack probes!

GolDDranks · on April 26, 2016

There were some explanations in the Reddit thread:

https://www.reddit.com/r/rust/comments/4ghy5h/blog_by_kamal_...

If I understood right, the stack probes are like this: detect stack allocations that are larger than a page so that they might go past guard pages. In those occasions, do a tight loop of memory accesses with one-page intervals that would trigger the guard pages, to prevent the access going past the guard pages going unnoticed. The probes are done only on large allocations, so the costs are minimal compared to the size of the allocation.

If the stacks grows only by small increments, like on normal recursion, guard pages are going to catch it.

Annatar · on April 26, 2016

Okay, but such a bug means Rust is not ready for prime time yet, so this situation then ends up being the case of don't believe the hype.

wofo · on April 26, 2016

I think steveklabnik is right. For instance, take a look at this Java bug, which is relatively recent: https://bugs.openjdk.java.net/browse/JDK-8046516

On the same grounds you could label Java as a language "not ready for prime time yet".

Annatar · on April 26, 2016

From a perspective of an assembler, C, shell, and AWK programmer, Java has never been, nor will it ever be ready for anything. We use it at work (we run thousands of Java applications, written externally and in house), and it is a total piece of shit: 300 GB of RAM consumption on the average(!!!), CPU intensive, slow, not backwards compatible (try changing the Java runtime environment under an application and see how well that works for you). From my experience and point of view, Java is overcomplicated and unintuitive, even for the simplest of tasks. Furthermore, I deeply regret that Java was one of my core computer science courses at the university (way back when Java was all the rage). Personally, I would rather drop dead than use Java, and if they tried to force me to write code in it again, I would quit on the spot!

If I had it my way, Java would be a criminal offense carrying a minimum prison sentence of at least 20 years, and if I ever have my own company and I catch someone using Java, I will fire them on the spot with so much gusto! It will be so awesome.

Manishearth · on April 26, 2016

`gcc` still as bugs. `clang` still has bugs. There was one recently on HN where one pass in clang caused undefined behavior in another. Are these not yet ready for prime time?

Furthermore, this is a known bug, and not too high priority probably since it's pretty rare to stack allocate something huge and then use it in a way that could cause a vulnerability (instead of crashing). The fix would involve replacing the segfault with a stack overflow, which also causes a crash.

Annatar · on April 27, 2016

> `gcc` still as bugs. `clang` still has bugs. There was one recently on HN where one pass in clang caused undefined behavior in another. Are these not yet ready for prime time?

I thought Rust was a language billing itself as safe. GCC and Clang are compilers, they have not mounted a campaign to bill themselves as safe, nor are they trying to bill themselves as the best thing since sliced bread.

dbaupp · on April 27, 2016

In this case, it's not actually that much of a bug: the segfault is entirely controlled. Threads' stacks are allocated with a guard page at the end that is marked as inaccessible (at the OS level), so any access---read or write---will make the OS kill the process, via a segfault.

This particular type of segfault differs to typical segfaults in other languages because it is guaranteed to happen and guaranteed to be a segfault/kill the program. Segfaults in C code often hint at more serious bugs (e.g. use after free) that can be exploited for remote code execution etc.

The bug is that the stack overflow detection isn't perfect: if writes to particularly large stack frames happen in just the wrong way, then a program can write beyond the guard page instead of being killed. This is obviously unfortunate, and the fix is ensuring LLVM has support for stack problems on all platforms (it currently has them on Windows).

Manishearth · on April 27, 2016

Yes, and this is a compiler bug. It is not a bug in the language, and it can be fixed once LLVM gets stack probes for platforms other than Windows (on which it already works). On Windows with rustc you can't cause this issue currently IIRC.

(It can be fixed in rustc itself by inserting stack probes on large stack allocations, but it won't be able to catch cases where LLVM moves things around and in the process creates a new large stack allocation; i.e. after tons of inlining)

Ultimately, it's very hard to weaponize a segfault caused by a stack overflow (and causing a segfault through a guard-page-skipping overflow itself is a rare thing), so as far as practical aspects are concerned, this doesn't really matter.

Like everyone has said so far, it's a bug, a bug which can and will be fixed.

steveklabnik · on April 26, 2016

All software has bugs. Of course, you can certainly decide the severity of each bug for your use-case, but you can find stuff that's just as nasty on any compiler's issue tracker.

pcwalton · on April 26, 2016

> Okay, but such a bug means Rust is not ready for prime time yet

Does the presence of miscompilations in GCC mean that GCC is not ready for prime time?

eddyb · on April 26, 2016

AFAIK this could've been fixed in LLVM a long time ago, which would also help C code compiled with clang, although I believe stack probes are opt-in there.

We can't fix it fully in rustc because we don't know the stack size, which can grow with aggressive inlining, for example.

I suppose we could summarily probe allocas we know are larger than the page size, which would solve this particular situation (one large variable), but it's not a panacea.

vvanders · on April 26, 2016

Spoiler alert: All software has bugs.

kalmar · on April 26, 2016

> In my view, for a language that bills itself as safe, there should not exist such things as undefined behaviors. As far as I am concerned, then, based on the advertising of Rust, this is false advertising.

The key thing from the reference [0]:

> "Type checking provides the guarantee that these issues are never caused by safe code."

It's subtle, but I think the situation is that "some segfaults are caused by undefined behaviour, and some undefined behaviour causes segfaults". Neither fully contains the other. One thing I was trying to get across is that a segmentation fault has a very specific meaning, and that meaning is not "bad thing was done with pointers".

[0]: http://doc.rust-lang.org/reference.html#behavior-considered-...

tatterdemalion · on April 26, 2016

It would be better if everyone would refrain from making comments like this about any project ever. You have made an inflammatory, unnuanced, and ill-informed interpretation of a complicated issue and used it to publicly thrash a lot of people's hard works. You are lowering the discourse.

I know that Rust in particular tries to be very open about the caveats to its claims, but this is the kind of commentary that causes any ambitious project to try to minimize and hide its weaknesses instead of openly and honestly discussing them. I opened this discussion knowing that there would be a comment just like yours, and it made my heart feel heavy.

Annatar · on April 27, 2016

> It would be better if everyone would refrain from making comments like this about any project ever. You have made an inflammatory, unnuanced, and ill-informed interpretation of a complicated issue and used it to publicly thrash a lot of people's hard works.

Something as core as proper stack handling is to me basic functionality. If the aim is to be a safe language, and there are undefined behaviors, or in this case such bugs, then this is at odds with the Rust public relations effort.

And, any project which resents such technical / conceptual criticism leaves one with much to ponder.

Manishearth · on April 27, 2016

> Something as core as proper stack handling is to me basic functionality.

rustc handles it just as much as other compilers do. It should be safer than that, but it's currently not due to a compiler bug, and features making it safer are not "basic stack handling".

This is not undefined behavior, as has been said multiple times in this thread; please stop repeating this lie. This is a deterministic crash that can't be exploited easily.

> And, any project which resents such technical / conceptual criticism leaves one with much to ponder.

No, their comment said "inflammatory/unnuanced/ill-informed criticism". Which yours is.

Annatar · on April 27, 2016

> This is not undefined behavior, as has been said multiple times in this thread; please stop repeating this lie.

I wrote: "and there are undefined behaviors, or in this case such bugs". "Whoever can read is clearly at an advantage".

> No, their comment said "inflammatory/unnuanced/ill-informed criticism". Which yours is.

I'll tell you what: if I had deployed Rust in production and I ran into crap like the article described, I'd be really pissed.

Manishearth · on April 27, 2016

This "crap" is a stack overflow. The "correct" way of handling this crap is to terminate the process on a stack overflow (Rust does this for most stack overflows anyway). The current incorrect way causes a segfault instead, which also terminates the process -- not much of a difference for a production user; it crashes either way.

This can lead to bad memory reads if you try really hard to write code that leads to it, and that too it will happen a small fraction of the time -- all the other times it will segfault-crash (and would be caught in testing or in deployment without being exploited).

wofo · on April 26, 2016

Of course there are cases in which you can get segfaults, and not only in Rust. The single fact of having a C FFI is enough to break all safety guarantees of your language.

What Rust does promise is that undefined behavior will never happen if you don't use code blocks marked as unsafe (which include the C FFI). In other words, if you are able to trigger undefined behavior without using unsafe code, that is a compiler or library bug.

EDIT: as noted by steveklabnik, this is not undefined behavior. Actually you should get a better error message instead of a segmentation fault (see the link to the issue on GitHub in steveklabnik's comment).

jws · on April 26, 2016

What does Rust promise if I infinitely recursively call a function which can not be tail optimized? Eventually the stack will exceed either available memory or available address space, what then?

kibwen · on April 26, 2016

This is easy to test with a program consisting of `fn main() { main() }`: https://play.rust-lang.org/

In release mode LLVM seems to be optimizing out the entire program due to dead-code elimination (or maybe sibling-call elimination?), but in debug this is what you get:

    <anon>:1:1: 3:2 warning: function cannot return without recurring, #[warn(unconditional_recursion)] on by default
    <anon>:1 fn main() {
    <anon>:2     main()
    <anon>:3 }
    <anon>:2:5: 2:11 note: recursive call site
    <anon>:2     main()
                 ^~~~~~
    <anon>:1:1: 3:2 help: a `loop` may express intention better if this is on purpose

    thread '<main>' has overflowed its stack
    fatal runtime error: stack overflow
    playpen: application terminated abnormally with signal 4 (Illegal instruction)

Even though Rust doesn't guarantee stack probes on any OS but Windows at the moment (LLVM patches pending), Rust does guarantee guard pages on all platforms (AFAIK), and assuming that a function's stack frame is always less than a page (which I think should be true), then Rust does indeed seem to guarantee that such an overflow will raise SIGILL.

dbaupp · on April 27, 2016

I don't think that example is directly relevant: it's tail recursive, and the code still compiles (so there's still runtime behaviour to consider). Additionally, the warning is emitted in both release and debug modes, but only covers "obvious" cases of infinite recursion: it won't say anything about functions that could recur deeply, or infinite recursion that is tricky to prove is infinite.

In any case, the Rust language, when it gets guarantees (i.e. a spec), will at the very least guarantee that stack overflow doesn't result in memory corruption, it may not be too opinionated on how exactly implementations make this guarantee. As you say, the rustc compiler currently handles it by guard pages and aborting but not perfectly (it's not hard to have a large stack frame: `let x = [0; 10000]`).

kibwen · on April 27, 2016

I'm confused by all your points. The fact that this function is tail recursive doesn't matter, because TCE isn't happening when compiled in debug mode (it would matter if Rust had some other memory-unsafety mitigation that was only enabled in debug mode, but it doesn't), so the stack overflow it exhibits is no different from the stack overflow in a non-tail-recursive function. As for the warning, I wasn't talking about that at all, in fact, I was considering leaving it out of the comment entirely. The relevant part of the output is the final three lines. As for `let x = [0; 10000000];`, that's missing the point of the question that I was responding to, which was concerned with the consequences of stack overflow via recursion, which is moot if the function immediately overflows its stack to begin with.

wofo · on April 26, 2016

IIRC Rust will then abort the process with a SIGILL and display a generic error message.

Ono-Sendai · on April 26, 2016

And if you read the article, that is the case. You can get crashes without using blocks marked as unsafe.

wofo · on April 26, 2016

I am not denying that. In fact, there is a GitHub issue intended to solve this (https://github.com/rust-lang/rust/issues/16012).

Sean1708 · on April 26, 2016

For those wondering about segfaults specifically in Rust (I know it's not the point of the blog post but it might be interesting to others), this thread talks about why they occur/whether they'll ever be eliminated entirely:

https://users.rust-lang.org/t/rust-guarantees-no-segfaults-w...

steveklabnik · on April 26, 2016

One small update to that thread: hitting the guard page no longer sends SIGSEV, but SIGILL: https://github.com/rust-lang/rust/pull/31333

EliRivers · on April 26, 2016

The first sample code; "This program segfaults because the entire stack is set to 0 at program start."

I'd be surprised; as a strong general rule, the stack does not get zeroed [Edit: see end of thread! It's the OS zeroing everything - learn something very day]. I'd expect it to segfault because the pointer value is whatever leftover non-zero value happens to be in that piece of memory, so it points into random memory the user program shouldn't be messing with (sticking in a printf to output the value of the pointer confirms this on at least one system). Wouldn't be surprised if some implementations took security really seriously and zero everything, or if a debug build was zero happy, but under normal circumstances, the stack doesn't get zeroed.

Annatar · on April 26, 2016

> It's the OS zeroing everything - learn something very day

No, the operating system (kernel, actually) does not zero out anything. The runtime linker would be the one initializing static memory declared in the ELF BSS sections at execution time. The rest (including the stack and heap) is setup by the prologue (crti.o, depending on the OS crt1.o, and crts.o).

dkopi · on April 26, 2016

edit: exDM69 actually makes more sense.

EliRivers · on April 26, 2016

Well, I can see the build command on the webpage there, and assuming this is the same gcc we all know and love without any cleverness going on, it's not a debug build of any kind as far as I can see:

    gcc segfault.c

Now that I think about it, I'm not aware that gcc C compiler even offers the option to deliberately zero initialise non-static local variables, so unless I've missed a switch somewhere there is no arrangement of options available for a "debug build" to do this. I recall the GCC Fortran compiler did offer it.

exDM69 · on April 26, 2016

It's the OS linker/loader that zeros the pages used by a new process to avoid leaking memory contents of previous processes (security risk).

It's zero only because this occurs at the very beginning of the program. In general it would be undefined.

Buge · on April 26, 2016

The OS will initialize the memory of the stack to 0 before the program starts. But before main is called, the compiler is free to insert other code that runs before main. gcc inserts a function named __libc_start_main that runs before main. This code will modify the content of the stack. So when main is run, the stack where the uninitialized local variable is has a decent chance of not being 0 anymore.

This is easily testable.

    #include <stdio.h>
    
    int main(void) {
        char* pointers[20];
        int i;
        for (i = 0; i < 20; ++i) {
          printf("%p\n", pointers[i]);
        }
        return 0;
    }

And yes when I run it, most of the pointers are not null.

exDM69 · on April 26, 2016

Sure, the C runtime initialization runs before main. Unless you're looking at the stack at _start, it's probably unintialized. And it depends on your libc implementation, etc.

It just happened to be zero in the author's case.

kalmar · on April 26, 2016

Oh this is cool! I should have checked more pointers. On my machine, the single pointer was always null, and I read up on stack being initialized to zero. I didn't realize the things-before-main could mess up the stack so much.

EliRivers · on April 26, 2016

Nice. Learn something every day. I edited my original post to add this; save anyone else tracing through the whole thread.

kalmar · on April 26, 2016

I should have made that clearer, thanks! If there were intervening function calls there would be garbage. Then it would only "most likely" segfault instead of always segfault.

Szpadel · on April 26, 2016

> Curiously, I found that if I had a buffer size of even 1 byte over (8 MB - 8 KB), I still got the segfault. I’m not yet sure what’s going on there!

This is because of gcc padding. Programs have to allocate whole page from OS. So if you want just 1 int, you have to get whole page for it (compilers can optimize it in some conditions). This is result of MMU that works for memory block and not for single bytes (performance issue I think) But as I know by default page size have 4KB.

Another reason may be that compiler tries to allocate 2^n bytes because of performance. and 8KB is close enough I think.

dfox · on April 26, 2016

The main problem there is that local non-static variables get placed onto stack. Stack space is allocated by generated code by just decrementing stack pointer without any explicit calls to OS. On typical modern unix, only few pages of stack are actually mapped and kernel handles page faults on neighboring pages by allocating more stack pages. Because it is possible that function needs more than one page of stack space, there is more than one such "magic" stack page, but still there is some finite number of them (othervise there would be no way to distinguish between accesses beyond the end of stack that should grow stack and accesses to random unmapped memory). Thus if you allocate some ridiculously large things on stack and access them in the direction opposite of stack growth, you may get segfault (accessing these structures in "the right order" is by no means sufficient for this to be safe because there are things like red-zones, signals, other local variables...).

This is true only for first thread, other threads have fixed stack size specified on thread creation (on Linux it's 8MB by default), but usually even thread's stack pages are really allocated only on first access.

On UNIX if you really want to bump stack by arbitrary amounts the most portable way is to preallocate your own stack of sufficient size and then use that (either by abusing sigaltstack() or via makecontext()/setcontext() or possibly by creating new thread). But generally, having large local variables is not exactly good idea.

bogomipz · on April 28, 2016

Could you elaborate on "Stack space is allocated by generated code" Are you talking about the C runtime in the binary? Thanks.

dfox · on April 28, 2016

No. Typical C function looks something like this:

  push %rbp
  mov %rsp, %rbp
  sub $something, %rsp
  ... actual code ...
  mov %rbp, %rsp
  ret

The sub $something, %rsp instruction is everything that user-space does to allocate stack memory. Actual allocation of stack pages happens in kernel in a way that is completely transparent as long as the $something does not get unreasonably large.

kalmar · on April 26, 2016

Ah so the idea is that there's already 8 KB of actual pages set for stack by this point. So when I try to get another (8 MB - 8 KB + 1 B), that blows things up? I wonder if I can watch this happen in /proc/$pid/maps or somewhere else around there.

amelius · on April 26, 2016

> Segfaults are our friends and teachers

Too bad memory is not better segmented then. For instance, when linking against a library, that library's memory ends up in the same "segment" as the program itself. Therefore, right now, you can totally screw up a library's internal data structures without even causing a segfault directly.

userbinator · on April 26, 2016

These are called guard pages. Attempts to write there would result in a segmentation fault.

...which are caught by the OS and used to either truly kill the process when the stack overflows, or to dynamically allocate more memory as the stack grows downwards. That's how it works on Windows, at least; I'm not as clear about Linux.

catern · on April 26, 2016

The robust solution to this problem is not hardcoding the pipe buffer size and changing the size of pipe buffers within your program to match your hard coded value, but rather calling fgetconf to query the pipe buffer size for the pipe FD you are working with.