The "bootstrapable.org" project that OP refers to is an interesting practical result from the infamous "Reflections on Trusting Trust."[1] If you have the source compiler, which is itself bootstrapped from source, then you effectively sidestep the problem brought up in the paper -- someone sneaks in a boobytrapped compiler somewhere in the process resulting in a chain of tainted compilers.
This is the kind of work that seems pretty thankless, but I'm glad someone is doing it.
I have no idea if it's true, but it is really not that difficult to achieve in a setting like this, where binaries are compiled from the sources that are available on the system itself. Even creating a compiler that poisons itself is not that difficult. It is the idea itself which is genius, and (unfortunately) completely doable.
If you have the source compiler, which is itself bootstrapped from source, then you effectively sidestep the problem brought up in the paper
How does this work?
Is the source compiler bootstrapped from its source, such that the source compiler compiles the source of the source compiler and that results in the source compiler?
I think I’m getting confused because “source compiler” describes what a compiler does anyway, so I’m guessing that “source” in “source compiler” is used as in origin and not as in textual code, rather than a redundancy? So which source is the source compiler compiling?
I don’t see it either. The paper specifically talked about modifying the source of the “root” compiler, and then using it to compile the system compiler.
This can be generalized to N compilers, so to truly claim that your compiler is safe, you’ll need to verify the source of all compilers in the chain.
I’m not an expert, so please correct me if I’m off here!
Assembler is pretty trivial to translate to machine code (and can be even written so that it's 1 to 1 per line).
Theoretically (not very practical due to size) you could translate the compiler text that you see to machine executable code yourself, looking at some opcode reference.
Apologies, maybe that is the approach we’re talking about here, and I’ve simply confused myself. :)
OP: If you have the source compiler, which is itself bootstrapped from source, then you effectively sidestep the problem brought up in the paper.
So does that translate to English as:
You have the source code for a compiler. You’ve ensured that the source code isn’t tainted in any way. If you translate that source code into an executable binary by hand, then you know that binary isn’t tainted. And subsequently, that binary won’t taint its output.
?
Hmm. That just doesn’t seem interesting. And, as you say, “Theoretically... you could...”. Emphasis mine...
I think I’ve resolved my confusion (see my updated response to coldtea.) I think you’re 100% correct.
This result just doesn’t seem interesting. Not only do you have to verify the source of all compilers in the chain, you need to ensure that - when that code is compiled - it’s the exact source that was verified.
And even then, how do you know that the disk firmware actually gives you that non-tainted compiler when you ask for it?
But: I guess it’s turtles all the way down. This at least theoretically improves how much of the chain (from atoms on a disk to the binaries produced by your Nth compiler) is non-tainted.
Presumably the drawback here is that rust is going to take an increasing amount of time to bootstrap on these platforms.
You're releasing about 8 a year (excluding bug fix releases), each of which is necessary to compile the next one, and so on down the line? That sounds like it's going to get extremely nasty, really quickly.
That’s assuming that someone wants to fully bootstrap. Only a very, very small number of people actually do this, and since this is all about trust anyway, it all depends on your level of paranoia.
mrustc has already bootstrapped a byte-identical rustc from the mainline, that in and of itself is good enough for many. Even Debian didn’t do a full bootstrap from the OCaml days.
It’s all about trust. If you want to be mega paranoid, then yeah it’s gonna be a lot of work. But that always is. This whole thing is only an issue for a very small number of people. Those people are important! But it’s a tradeoff, like everything.
Once you've bootstrapped a single platform a certain number of times with the exact same compiler output through a diverse-enough set of toolchains, you can start to trust that the entire toolchain (including hardware) is secure and simply release hashes of those diversely-compiled binaries as trusted roots for faster rebuilds (ie: nightlies).
And you can build GCC-8 with GCC-4.8, whereas rust seems to require at least the previous version of rust. It would be reasonable for rust 1.19 to be able to build the latest release for the next couple of years ... rust 1.19 is less than 18 months old! Surely rust was an OK language 18 months ago?
Rust’s standard library makes heavy use of unstable features that may change from release to release.
We’ve been slowing down the requirements over time; the current policy is that you can build any stable with the previous release. Maybe someday we’ll slow it down further, but there’s no plans to do so any time soon.
Bootstrapping is done rarely but ongoing compiler development is done all the time. It seems like a bad idea to optimize for bootstrapping at the cost of not being able to use the best Rust has to offer in compiler development today.
The benefit is to stable software distributions, and also to from-scratch software stack builds in diverse environments. As a software engineer who has been interested in reliable and understandable software systems for a couple decades, I find the Rust trend of following the web-ecosystem and completely abandoning software support after a couple of months to be _nuts_.
Look, I'm a reasonable guy. I don't insist on being compatible with Linux 2.4, I don't insist on working with a compiler and system libraries from 2006, I don't reject all bundling completely. But working with stuff from a few years ago would be very helpful to lots of projects and efforts, and is what I've come to expect from high quality software distributions and libraries, like Debian stable, the linux kernel, GCC, libpng, sqlite3, etc etc.
(The linux kernel is a huge project and maybe the fastest-moving in existence, and you can build the very latest release with GCC-4.7!)
> I find the Rust trend of following the web-ecosystem and completely abandoning software support after a couple of months to be _nuts_.
If Rust was a "traditional" project and not following web-ecosystem practices, it would still be issuing 0.x releases though - the language itself is quite far from true maturity (the "2018" version has only just gained NLL, and there are plenty of deeply-impacting features in the pipeline, at various stages of development). So it's six of one, half a dozen of the other...
That seems perfectly reasonable for unstable software in heavy development, which Rust might very well be. Unfortunately that also means it shouldn't be used as part of anything stable, which makes using it in Firefox a terrible idea.
Think of it as LLVM declaring it has a hard dependency on the Linux kernel from last week (that isn't in a release yet).
Absolutely no-one of the rustaceans I personally know want to write a single more line of pre-1.31 rust. NLL is a major ergonomics change, that makes all the simple things so much better.
I completely believe you - but think about how absurd that is. Why did any of them want to write any rust a month ago? Were all rust users really naive/dumb to think that rust 1.29 was any good at all? Only 2 months later they never want to write Rust 1.29 ever again!
I can be quite productive with C and Python from 10 years ago. Most Go I programs I write happen to be compatible with Go-1.7 from 2016, but I could go back to Go-1.4 from 2014 without much trouble if I had a reason to. Coincidentally, you can build the latest Go toolchain with Go-1.4, which was the last version written in C. I guess the Golang authors are just superhuman engineers, huh ...
I think your parent is exaggerating a bit, but this is just generally true of any release with a bug feature in it that people are excited to use. Now that it exists, why would you not want to use it?
For context, Rust 1.31 is sorta like the Go 2 release. It’s a much bigger deal than regular releases.
It's not always up to me which compiler version I use.
Some organizations change compilers very slowly, and you can't assume you always have the most updated compiler.
For example, when writing a library for my org I assumed that GCC 5.4 (2016) is a reasonable lower bound, and I used the appropriate set of supported C++11 features, and twice I had to downgrade as different teams notified me they had older and older compilers. (and couldn't upgrade)
Rust 1.31 shipped a specific feature that was proposed right from the start of the lexical lifetimes, since well before 1.0. People have wanted it since first hearing about it, but it just turned out to be much more difficult to implement correctly than first estimated.
> Were all rust users really naive/dumb to think that rust 1.29 was any good at all? Only 2 months later they never want to write Rust 1.29 ever again!
Rust 1.29 was fast and secure, but it also was annoying to write. Lexical lifetimes forced the structure of your code into a shape that sometimes required unnatural contortions. Basically everyone who wrote code in it could tell that something was definitely wrong. We still used it, because the other things Rust provided was worth the contortions. It was just a tradeoff you had to make. NLL, provided in 1.31, essentially removes the problem, making the other side of the tradeoff free. While previous versions of the Rust weren't horrible, 1.31 is massively better than them.
There are many Rust compilers in the bootstrap chain. The point is to make every step in the chain human-readable. Auto-generated C is not "source code" in the sense of "the preferred form of the work for making modifications to it" (the GPL's definition of source code). A malicious code generator could hide a trusting trust attack in the generated code in such a way that it would be difficult to find. True source code is easier to audit.
This is the kind of work that seems pretty thankless, but I'm glad someone is doing it.
[1] https://www.archive.ece.cmu.edu/~ganger/712.fall02/papers/p7...