I am curious why the TPDE paper does not mention the Copy-And-Patch paper. That ...

aengelke · 2025-09-03T12:34:12 1756902852

There's a longer paragraph on that topic in Section 8. We also previously built an LLVM back-end using that approach [1]. While that approach leads to even faster compilation, run-time performance is much worse (2.5x slower than LLVM -O0) due to more-or-less impossible register allocation for the snippets.

[1]: https://home.cit.tum.de/~engelke/pubs/2403-cc.pdf

debugnik · 2025-09-03T13:01:07 1756904467

> run-time performance is much worse (2.5x slower than LLVM -O0)

How come? The Copy-and-Patch Compilation paper reports:

> The generated code runs [...] 14% faster than LLVM -O0.

I don't have time right now to compare your approach and benchmark to theirs, but I would have expected comparable performance from what I had read back then.

aengelke · 2025-09-03T14:37:28 1756910248

The paper is rather selective about the used benchmarks and baselines. They do two comparisons (3 microbenchmarks and a re-implementation of a few (rather simple) database queries) against LLVM -- and have written all benchmarks themselves through their own framework. These benchmarks start from their custom AST data structures and they have their own way of generating LLVM-IR. For the non-optimizing LLVM back-end, the performance obviously strongly depends on the way the IR is generated -- they might not have put a lot of effort into generating "good IR" (=IR similar to what Clang generates).

The fact that they don't do a comparison against LLVM on larger benchmarks/functions or any other code they haven't written themselves makes that single number rather questionable for a general claim of being faster than LLVM -O0.

t0b1 · 2025-09-03T14:35:14 1756910114

This is in relation to their TPCH benchmark which can be due to a variety of reasons. My guess would be that they can generate stencils for whole operators which can be transformed into more efficient code at stencil generation time while LLVM-O0 gets the operator in LLVM-IR form and can do no such transformation. Though I can't verify this because their benchmark setup seems a bit more involved.

When used in a C/C++ compiler the stencils correspond to individual (or a few) LLVM-IR instructions which then leads to bad runtime performance. Also as mentioned, on larger functions register allocation becomes a problem for the Copy-and-Patch approach.

procrast33 · 2025-09-03T13:33:30 1756906410

Apologies! I did do a text search, but in pdfs... I should have known better.

Your work is greatly appriciated. With unit tests everywhere, faster compiling is more important than ever.

PoignardAzur · 2025-09-04T07:25:16 1756970716

Wait, so what's the difference between TDPE and Copy-and-Patch?

I thought they used the same technique (pre-generating machine code snippets in a high-level language)?