Working towards a source-based bootstrapping path to a GNU+Linux system

mtzet · on Aug 23, 2021

Bootstrapping GNU userland is a major pain point, and it's great to see GNU Guix taking it so seriously. There's even a section of their manual dedicated to it [1].

This is important, even if you don't intend to run GNU Guix itself. I care about building GNU userland for embedded targets, and even with a build system like Yocto, which builds specific versions of the entire host userland, you can get errors due to eg. an older gnu m4 not being buildable with a modern glibc. Thus, you end up essentially requiring developers to build inside a container.

This is paving the way for having everything required to build a piece of software just checked into the git repository. If you want to change it, you just commit it like everything else.

[1] https://guix.gnu.org/manual/en/guix.html#Bootstrapping

cogburnd02 · on Aug 23, 2021

somewhat related: https://bellard.org/tcc/tccboot.html

was a project to boot the Linux kernel directly from source code using a C compiler as part of the boot process.

dane-pgp · on Aug 23, 2021

As well as the diagram on that page, there is a nice document in one of the repos listing the bootstrapping steps in order, from "stage0" to "gcc 4.7.4" and beyond:

https://github.com/fosslinux/live-bootstrap/blob/master/part...

snicker7 · on Aug 23, 2021

Bootstrappability is the reason why GNU Guix is one of the most ever important software projects. Moreover, I see Guix as the unifying force for the entire GNU project. It is the perfect playground to test and develop emerging tech (Shepherd, Hurd, &tc.).

NeutralForest · on Aug 23, 2021

Can someone explain to me why bootstrapping is important and/or useful?

gravypod · on Aug 23, 2021

Either you're developing your own embedded platform or you are attempting to gain dull supply chain control of your software.

bcrl · on Aug 24, 2021

Bootstrapping gcc and libc was one of the time honoured traditions I went through as a teenager with a Linux box. Learning how to set up gcc as a cross compiler to produce ELF binaries on an a.out system (or generating m68k binaries on my 486) was quite helpful later in my career when working on embedded systems. Granted, a lot of this was necessary as SLS and Slackware didn't exactly have very capable packaging systems at the time.

math-dev · on Aug 22, 2021

I find this fascinating and am a big supporter of FSF and GNU.

All that said, I am not an expert so would like to learn more. Can somebody let me know why one cannot just take the assembly version of an existing compiler and carefully review its code to be happy with it and then build everything from that verified compiler? Why does it need so many steps?

1MachineElf · on Aug 22, 2021

I think you might find an answer for that question in the GCC 4.7 step. They target that version because all GCC versions afterwards include a C++ compiler in addition to the C one. Each successive step is a greater level of complexity. By starting small in the beginning, they have a codebase that is easier to audit than a full blown "modern" GCC or LLVM. That's the idea, at least.

apaprocki · on Aug 23, 2021

> all GCC versions afterwards include a C++ compiler in addition to the C one

GCC 4.7 is the last version that can be built from source using only a C compiler. GCC has long included the C++ compiler inside, but didn’t require one to build until 4.8.

1MachineElf · on Aug 23, 2021

Thank you for the clarification.

pabs3 · on Aug 23, 2021

How do you know the Linux kernel you are running that verified compiler on isn't subverting the compiler? The only way to do bootstrapping sanely is to start from some manually written machine code (not assembler) and eventually reach Linux/GCC/etc. This is the approach being taken by Bootstrappable Builds.

fatcow · on Aug 23, 2021

> Can somebody let me know why one cannot just take the assembly version of an existing compiler and carefully review its code to be happy with it and then build everything from that verified compiler? Why does it need so many steps?

Because your current OS to load the assembly code may have been poisoned to present you with a sanitized version on the compiler.

selfhoster11 · on Aug 23, 2021

It's worth noting that the above comment, while it might sound paranoid to some, is IMO entirely justified.

I'm 50/50 on whether someone at some point hasn't executed a successful Trusting Trust attack (see Ken Thompson). With modern machines that have megabytes of binary blobs, different co-processors that have access to the RAM while they can't be reprogrammed to be on the user's side, and techniques that can actually tell when sensitive operations are happening, such attacks are becoming more feasible.

pabs3 · on Aug 23, 2021

There definitely have been compromised build toolchains before:

https://en.wikipedia.org/wiki/XcodeGhost

tremon · on Aug 23, 2021

That's only half of the trusting trust-attack though; the other half is being able to make the compiler compromise propagate itself, i.e. not just inserting any backdoor in compiled code, but inserting itself in any compiler built using the compromised tool.

selfhoster11 · on Aug 23, 2021

Mobile and Electron apps often weight hundreds of megs. That's enough data to hide an entire classic-style OS in the spaces between the data. While I don't know whether someone did insert such a recursive compiler, they certainly can do it unobtrusively enough that it doesn't raise any suspicion.

selfhoster11 · on Aug 23, 2021

Thank you, I wasn't familiar with this case.

aninteger · on Aug 23, 2021

Super interesting, but I wonder if there is a historical accurate account of bootstrapping that is well documented. This jumps into ELF pretty quickly but there were older formats like a.out and OMAGIC? before ELF. Is there good documentation on the bootstrapping of x86 BSD or Minix since they are even older than Linux.