They are part of the the psABI on aarch64 and POWER. Due to to the link register, they can even omit the frame pointer from leaf functions without losing the identity of the immediate caller.
Generally, the “compile with frame pointers” request is a bit underspecified on x86-64. For example, the perf trampoline that is cited as the reason why future Python versions are going to use frame pointers itself does not have a frame pointer. (It does not clobber it, either, but backtraces through it will skip a frame.) Similarly, if you just compile glibc with frame pointers without patching a handful of assembler files, then for typical workloads, 5% to 10% of your samples will have an incomplete backtrace because the immediate caller of glibc string functions is not recorded (they don't have a frame pointer, either).
I expect this x86-64 debate will be obsolete Really Soon Now because everyone will simply copy out the shadow stack on every sample. It's even faster than frame pointer traversal because it's just an array copy. We are still looking at making DWARF backtraces a bit faster (or even the more complex unwinding case), but I doubt that this work will be impactful, given the politics involved. The shadow stack will have some performance impact, too, but it can be switched off on a per-process basis if necessary.
Generally, the “compile with frame pointers” request is a bit underspecified on x86-64. For example, the perf trampoline that is cited as the reason why future Python versions are going to use frame pointers itself does not have a frame pointer. (It does not clobber it, either, but backtraces through it will skip a frame.) Similarly, if you just compile glibc with frame pointers without patching a handful of assembler files, then for typical workloads, 5% to 10% of your samples will have an incomplete backtrace because the immediate caller of glibc string functions is not recorded (they don't have a frame pointer, either).
I expect this x86-64 debate will be obsolete Really Soon Now because everyone will simply copy out the shadow stack on every sample. It's even faster than frame pointer traversal because it's just an array copy. We are still looking at making DWARF backtraces a bit faster (or even the more complex unwinding case), but I doubt that this work will be impactful, given the politics involved. The shadow stack will have some performance impact, too, but it can be switched off on a per-process basis if necessary.