Bootstrapping Rust

rusbus · on Dec 15, 2018

The "bootstrapable.org" project that OP refers to is an interesting practical result from the infamous "Reflections on Trusting Trust."[1] If you have the source compiler, which is itself bootstrapped from source, then you effectively sidestep the problem brought up in the paper -- someone sneaks in a boobytrapped compiler somewhere in the process resulting in a chain of tainted compilers.

This is the kind of work that seems pretty thankless, but I'm glad someone is doing it.

[1] https://www.archive.ece.cmu.edu/~ganger/712.fall02/papers/p7...

colejohnson66 · on Dec 15, 2018

“someone sneaks in a boobytrapped compiler somewhere...”

Especially one that spits out a white supremacy message every once in a while: https://www.quora.com/What-is-a-coders-worst-nightmare/answe...

_pvxk · on Dec 15, 2018

Wow. But it sounds almost too good to be true; anyone know if it's for real?

Drdrdrq · on Dec 16, 2018

I have no idea if it's true, but it is really not that difficult to achieve in a setting like this, where binaries are compiled from the sources that are available on the system itself. Even creating a compiler that poisons itself is not that difficult. It is the idea itself which is genius, and (unfortunately) completely doable.

WalterGR · on Dec 15, 2018

If you have the source compiler, which is itself bootstrapped from source, then you effectively sidestep the problem brought up in the paper

How does this work?

Is the source compiler bootstrapped from its source, such that the source compiler compiles the source of the source compiler and that results in the source compiler?

I think I’m getting confused because “source compiler” describes what a compiler does anyway, so I’m guessing that “source” in “source compiler” is used as in origin and not as in textual code, rather than a redundancy? So which source is the source compiler compiling?

coldtea · on Dec 15, 2018

The source compiler could be in assembler, for one.

WalterGR · on Dec 15, 2018

Compiling or assembling, how does this work around the problem described in Reflections on Trusting Trust?

Cyph0n · on Dec 15, 2018

I don’t see it either. The paper specifically talked about modifying the source of the “root” compiler, and then using it to compile the system compiler.

This can be generalized to N compilers, so to truly claim that your compiler is safe, you’ll need to verify the source of all compilers in the chain.

I’m not an expert, so please correct me if I’m off here!

coldtea · on Dec 15, 2018

Assembler is pretty trivial to translate to machine code (and can be even written so that it's 1 to 1 per line).

Theoretically (not very practical due to size) you could translate the compiler text that you see to machine executable code yourself, looking at some opcode reference.

WalterGR · on Dec 15, 2018

While that’s true, I don’t believe that’s the approach we’re discussing here.

WalterGR · on Dec 15, 2018

Apologies, maybe that is the approach we’re talking about here, and I’ve simply confused myself. :)

OP: If you have the source compiler, which is itself bootstrapped from source, then you effectively sidestep the problem brought up in the paper.

So does that translate to English as:

You have the source code for a compiler. You’ve ensured that the source code isn’t tainted in any way. If you translate that source code into an executable binary by hand, then you know that binary isn’t tainted. And subsequently, that binary won’t taint its output.

?

Hmm. That just doesn’t seem interesting. And, as you say, “Theoretically... you could...”. Emphasis mine...

WalterGR · on Dec 15, 2018

I think I’ve resolved my confusion (see my updated response to coldtea.) I think you’re 100% correct.

This result just doesn’t seem interesting. Not only do you have to verify the source of all compilers in the chain, you need to ensure that - when that code is compiled - it’s the exact source that was verified.

And even then, how do you know that the disk firmware actually gives you that non-tainted compiler when you ask for it?

But: I guess it’s turtles all the way down. This at least theoretically improves how much of the chain (from atoms on a disk to the binaries produced by your Nth compiler) is non-tainted.

pygy_ · on Dec 16, 2018

Don't forget the CPU, which can also be malicious.

That being said, that problem can be worked around by throwing interpreters into the mix.

WalterGR · on Dec 15, 2018

The "bootstrapable.org" project that OP refers to...

“Strappable” has 2 P’s... Link: https://bootstrappable.org/

yoklov · on Dec 15, 2018

> There are plans to extend mrustc to support newer Rust, but it turned out to be difficult.

Was some feature added to rust 1.20.0 that was particularly difficult to implement? Or is this just a 'have to stop somewhere' situation.

steveklabnik · on Dec 15, 2018

My understanding is that the goal was to break the bootstrap chain, and so once that was done, there wasn’t a ton of reasons to keep working on it.

mrustc is effectively written entirely by one person: https://github.com/thepowersgang/mrustc/graphs/contributors

It's extremely impressive, but I can also understand why trying to keep up doesn't make a ton of sense.

Twirrim · on Dec 15, 2018

Presumably the drawback here is that rust is going to take an increasing amount of time to bootstrap on these platforms.

You're releasing about 8 a year (excluding bug fix releases), each of which is necessary to compile the next one, and so on down the line? That sounds like it's going to get extremely nasty, really quickly.

steveklabnik · on Dec 15, 2018

That’s assuming that someone wants to fully bootstrap. Only a very, very small number of people actually do this, and since this is all about trust anyway, it all depends on your level of paranoia.

mrustc has already bootstrapped a byte-identical rustc from the mainline, that in and of itself is good enough for many. Even Debian didn’t do a full bootstrap from the OCaml days.

It’s all about trust. If you want to be mega paranoid, then yeah it’s gonna be a lot of work. But that always is. This whole thing is only an issue for a very small number of people. Those people are important! But it’s a tradeoff, like everything.

mmastrac · on Dec 15, 2018

Once you've bootstrapped a single platform a certain number of times with the exact same compiler output through a diverse-enough set of toolchains, you can start to trust that the entire toolchain (including hardware) is secure and simply release hashes of those diversely-compiled binaries as trusted roots for faster rebuilds (ie: nightlies).

_pvxk · on Dec 15, 2018

https://dwheeler.com/trusting-trust/ is the page on Diverse Double-Compiling as a counter to the trusting trust attack

ximeng · on Dec 15, 2018

So bootstrap chain here is g++->mrustc->several iterations of rust. (Rather than original bootstrap chain via ocaml.)

Bootstrap for g++ is presumably something like machine code->asm->c->g++.

And the overall goal is something like shortening or simplifying the chain from machine code to rust compiler.

Ideally I suppose this would be something like machine code->proto rust->rust compiler.

Haskell seems to have a relatively good pipeline, with clean division between core and non-core.

https://ghc.haskell.org/trac/ghc/wiki/Commentary

geertj · on Dec 15, 2018

Fun fact: OCaml, which was used to bootstrap rust, is also bootstrapped itself. And it used a binary that was added here https://github.com/ocaml/ocaml/commit/84bbb2fd6f493112b43008....

reirob · on Dec 15, 2018

I was just yesterday reading that bootstrapping Haskell seems to have unanswered questions too:

https://www.reddit.com/r/haskell/comments/a600zp/thoughts_on...

earenndil · on Dec 15, 2018

GCC is bootstrapped presumably with stage0 https://github.com/oriansj/stage0

clort · on Dec 15, 2018

The nice(r) thing about bootstrapping GCC is that there are many alternative implementations of C and C++ already existing.

ploxiln · on Dec 15, 2018

And you can build GCC-8 with GCC-4.8, whereas rust seems to require at least the previous version of rust. It would be reasonable for rust 1.19 to be able to build the latest release for the next couple of years ... rust 1.19 is less than 18 months old! Surely rust was an OK language 18 months ago?

steveklabnik · on Dec 15, 2018

Rust’s standard library makes heavy use of unstable features that may change from release to release.

We’ve been slowing down the requirements over time; the current policy is that you can build any stable with the previous release. Maybe someday we’ll slow it down further, but there’s no plans to do so any time soon.

hsivonen · on Dec 15, 2018

Bootstrapping is done rarely but ongoing compiler development is done all the time. It seems like a bad idea to optimize for bootstrapping at the cost of not being able to use the best Rust has to offer in compiler development today.

ploxiln · on Dec 15, 2018

The benefit is to stable software distributions, and also to from-scratch software stack builds in diverse environments. As a software engineer who has been interested in reliable and understandable software systems for a couple decades, I find the Rust trend of following the web-ecosystem and completely abandoning software support after a couple of months to be _nuts_.

Look, I'm a reasonable guy. I don't insist on being compatible with Linux 2.4, I don't insist on working with a compiler and system libraries from 2006, I don't reject all bundling completely. But working with stuff from a few years ago would be very helpful to lots of projects and efforts, and is what I've come to expect from high quality software distributions and libraries, like Debian stable, the linux kernel, GCC, libpng, sqlite3, etc etc.

(The linux kernel is a huge project and maybe the fastest-moving in existence, and you can build the very latest release with GCC-4.7!)

zozbot123 · on Dec 15, 2018

> I find the Rust trend of following the web-ecosystem and completely abandoning software support after a couple of months to be _nuts_.

If Rust was a "traditional" project and not following web-ecosystem practices, it would still be issuing 0.x releases though - the language itself is quite far from true maturity (the "2018" version has only just gained NLL, and there are plenty of deeply-impacting features in the pipeline, at various stages of development). So it's six of one, half a dozen of the other...

acqq · on Dec 15, 2018

To save others searching; The mentioned NLL is: https://github.com/rust-lang/rfcs/blob/master/text/2094-nll....

“non-lexical lifetimes”

steveklabnik · on Dec 15, 2018

For a simpler explanation, see https://doc.rust-lang.org/edition-guide/rust-2018/ownership-... (the guide says beta but it’s in stable now, that will be fixed soon.)

mook · on Dec 15, 2018

That seems perfectly reasonable for unstable software in heavy development, which Rust might very well be. Unfortunately that also means it shouldn't be used as part of anything stable, which makes using it in Firefox a terrible idea.

Think of it as LLVM declaring it has a hard dependency on the Linux kernel from last week (that isn't in a release yet).

mjw1007 · on Dec 15, 2018

I suppose letting the compiler-writers use new language features gives the usual advantages of "dogfooding" your own product.

Tuna-Fish · on Dec 15, 2018

Absolutely no-one of the rustaceans I personally know want to write a single more line of pre-1.31 rust. NLL is a major ergonomics change, that makes all the simple things so much better.

ploxiln · on Dec 16, 2018

I completely believe you - but think about how absurd that is. Why did any of them want to write any rust a month ago? Were all rust users really naive/dumb to think that rust 1.29 was any good at all? Only 2 months later they never want to write Rust 1.29 ever again!

I can be quite productive with C and Python from 10 years ago. Most Go I programs I write happen to be compatible with Go-1.7 from 2016, but I could go back to Go-1.4 from 2014 without much trouble if I had a reason to. Coincidentally, you can build the latest Go toolchain with Go-1.4, which was the last version written in C. I guess the Golang authors are just superhuman engineers, huh ...

steveklabnik · on Dec 16, 2018

I think your parent is exaggerating a bit, but this is just generally true of any release with a bug feature in it that people are excited to use. Now that it exists, why would you not want to use it?

For context, Rust 1.31 is sorta like the Go 2 release. It’s a much bigger deal than regular releases.

ploxiln · on Dec 16, 2018

If you're writing a toy/exercise, sure, have fun, I support that. It's part of developing your skills and thinking about software.

If you're writing infrastructure, please don't. (e.g. Firefox ... or the toolchain itself)

I'm one of those guys with no particular interest in Go 2. I really have plenty to work with already.

gilnaa · on Dec 16, 2018

It's not always up to me which compiler version I use.

Some organizations change compilers very slowly, and you can't assume you always have the most updated compiler.

For example, when writing a library for my org I assumed that GCC 5.4 (2016) is a reasonable lower bound, and I used the appropriate set of supported C++11 features, and twice I had to downgrade as different teams notified me they had older and older compilers. (and couldn't upgrade)

Tuna-Fish · on Dec 16, 2018

Rust 1.31 shipped a specific feature that was proposed right from the start of the lexical lifetimes, since well before 1.0. People have wanted it since first hearing about it, but it just turned out to be much more difficult to implement correctly than first estimated.

> Were all rust users really naive/dumb to think that rust 1.29 was any good at all? Only 2 months later they never want to write Rust 1.29 ever again!

Rust 1.29 was fast and secure, but it also was annoying to write. Lexical lifetimes forced the structure of your code into a shape that sometimes required unnatural contortions. Basically everyone who wrote code in it could tell that something was definitely wrong. We still used it, because the other things Rust provided was worth the contortions. It was just a tradeoff you had to make. NLL, provided in 1.31, essentially removes the problem, making the other side of the tradeoff free. While previous versions of the Rust weren't horrible, 1.31 is massively better than them.

TheCoelacanth · on Dec 16, 2018

Is it supposed to be a bad thing that Rust keeps adding compelling new features that makes people want to upgrade?

sitkack · on Dec 15, 2018

But is GCC multi-bootstrapped using those alternate compilers? Do they all arrive at the same fixed point?

jhasse · on Dec 15, 2018

> Bootstrap for g++ is presumably something like machine code->asm->c->g++.

GCC is written in C++ though.

andrewchambers · on Dec 15, 2018

writing mrustc in C++ seems like such a mistake.

It compiles rust to C, so why not write it in rust? then compile itself to C.

Another idea, just compile rustc to web assembly then use https://github.com/WebAssembly/wabt/tree/master/wasm2c to convert it to your bootstrap source.

nine_k · on Dec 15, 2018

> why not. write it in rust?

Exactly to avoid having any rust compilers in the chain.

mrob · on Dec 15, 2018

There are many Rust compilers in the bootstrap chain. The point is to make every step in the chain human-readable. Auto-generated C is not "source code" in the sense of "the preferred form of the work for making modifications to it" (the GPL's definition of source code). A malicious code generator could hide a trusting trust attack in the generated code in such a way that it would be difficult to find. True source code is easier to audit.