This is a very charitable summary of the pros and cons to e-graph approaches to code rewriting. That said, the part that your analysis misses here is where it's being applied for this specific implementation. Standard optimization passes in languages like Julia/Numba/Jax need to be "general" in the sense that if they do things that violate IEEE semantics then all programs coded in that language could have hidden issues. There's usually a few knobs to tweak, like @fastmath in Julia, but those can be very heavy-handed and not domain-specific. So the question was, can we allow users of dynamic interactive JIT-compiled languages to write their own optimizing passes at runtime? The mechanism to do this is to use an e-graph specification because it can be very easy for domain scientists and engineers to describe what kind of pass they would want simply by defining equalities and a cost function (your description missed the cost function: it needs to be described as the equality saturation then tries to grab the element of the equivalence class that has the lowest cost). This gives a rather high-level yet robust mechanism for people who are not compiler engineers to then describe compiler transformations they would like to apply to their own code.
As the test case, Alessandro then applied this idea to ordinary differential equation (ODE) and differential-algebraic equation (DAE) models written in Julia's ModelingToolkit.jl symbolic system (the MetaTheory.jl and SymbolicUtils.jl symbolic rewrite systems are generic to the IR. Symbolics.jl defines a symbolic computing IR built on these tools, which then means it's a symbolic mathematics library with e-graph and traditional rule rewriting simplifiers, and ModelingToolkit.jl is a modeling framework built on this system. But note that you can directly apply this to Julia IR and LLVM IR, as shown in other examples not in this paper). The KUKA IIWA 14 robotic arm test case is a nice example because it uses rules like a*sin(x) -> 0 if a<tol, definitely a rule that you wouldn't want to generally apply, but is the kind of thing where when a scientist writes some generated code there might just be extra terms around taking a bunch of nonsense compute (this is a real-world example from a real-world user). A simple set of equalities with a cost function to minimize the floating point operations accelerates the simulation by 8.5x while changing the simulation result by ~1e-12 when the tolerance was 1e-11, that's a very useful problem-specific compilation pass that now has a 10 line syntax!
Thus we see the true application of this as not something to replace the "core" compilation passes that Julia or anything else relies on in its standard pipeline, instead we see this as a tool that scientists and application writers can use to selectively extend the compiler on the fly with domain-knowledge. This allows for compiler passes that are too specific and/or destructive to apply generically, but when applied smartly and selectively it can do some interesting things. Things like "fuse all a*b + c into fma(a,b,c)" is a rather trivial thing to write with this framework, so while a programming language may not make this the default behavior (since fma changes floating point results), this can make it easy to selectively enable such passes. That said, we still need to hook this into Julia's AbstractInterpreter so we can more easily apply it across Julia codes (I even wrote a Twitter thread saying our goal of 2022 should be to complete this https://twitter.com/ChrisRackauckas/status/14772748124604497...), there are many improvements to the e-graph saturation that could be implemented, there's many other applications to look into, etc. but I find it's a very promising start and I'm really impressed with how far Alessandro has come.
I do get that, and I understand the application to the more general numerical/ML language space. I glossed over it because I think, in the short term, as a way of exploration/filling in shortcomings, it's a useful approach.
In the medium/long term, i hope that particular use case dies a horrible death (no offense!), and stays highly limited to a few national labs or others.
In general, i think you will discover over time folks will not allow it in a production system in any way unless it was worth a billion dollars or enabled something fundamentally impossible otherwise.
That's why i glossed over it - this is not the first system i've seen try to allow normal users to add compiler optimizations for good reasons (I don't mean that sarcastically). I've even seen easy to use ones, and I've seen them generate the kinds of speedups you talk about, and be worth tons. All of them eventually stopped being used as soon as possible.
These are most useful when the underlying infrastructure/field is new/being fleshed out, like it is now.
When the chaos starts to settle, they are incredibly hard to maintain, etc.
It also becomes much easier to make something that serves users well enough over time.
As an aside, i'll also say that outside of a few domains, or very select applications, it is also incredibly hard to get people to care about performance, and in particular, to trade perceived (not necessarily real!) reliability or standardization for performance.
I think the TL;DR is basically - i think over time you will have a lot of trouble getting anyone writ large to want to allow or use those kind of extensions in production systems, and beyond that, the inability of the infrastructure to serve the majority of users without their help is a product failure ;)
Obviously, this is just my opinion from watching these sorts of spaces for decades and seeing how they develop, I know of others who have the exact opposite opinion (IE extensibility is more important than anything), so take it for whatever it's worth :)
It doesn't just have to be for application-specific performance though. You can also define the cost function to be to minimize the floating point error of a specific set of codes, which is what egg-herbie does. I can see throwing a custom pass on code that says "make this be the most floating point stable version it can find" as a nice quick fix for cases where application scientists without a lot of numerical analysis experience hit cases which traditionally would've required a lot more thought and care.
Another application we have in mind is to do similar linear algebra transformations to XLA. For example, sequential matrix-vector operations (A*v1 + A*v2) can be more optimally be applied as a single BLAS3 matrix-matrix call (A*[v1;v2]). Those kinds of rules can be very easily structured as an E-graph, and doing it this way would make it very inviting to mathematicians to extend and maintain the rulesets.
Both of those are cases where you may not want those passes to always be running, but they are super helpful in many numerical applications. And there's a lot more where that came from. Forward-mode AD can be implemented quite naturally this way. And there's a few more complex examples I'll hold in my back pocket until they are completed.
I think the OP's point is that running these optimizations in production as-is is dangerous because future code changes in the various places could accidentally impede the optimizer's ability to apply all the transformations that users of the codebase expect.
The obvious solution is to query the optimizer to get the final transformation as actual Julia code, and replace the pre-transform code with the post-transform optimized code, and disable any further optimization (aside from very trivial transforms that aren't worth directly including). This ensures that one doesn't accidentally lose the amazing benefits of this symbolic optimization approach on a given piece of code, and that production code always keeps its performance and correctness.
GHC rewrite directives sound less powerful than this, but they have been available for years, and "ordinary" people use them. Although, Haskell tends to attract compiler nerds the way Julia attracts numerics nerds.
As the test case, Alessandro then applied this idea to ordinary differential equation (ODE) and differential-algebraic equation (DAE) models written in Julia's ModelingToolkit.jl symbolic system (the MetaTheory.jl and SymbolicUtils.jl symbolic rewrite systems are generic to the IR. Symbolics.jl defines a symbolic computing IR built on these tools, which then means it's a symbolic mathematics library with e-graph and traditional rule rewriting simplifiers, and ModelingToolkit.jl is a modeling framework built on this system. But note that you can directly apply this to Julia IR and LLVM IR, as shown in other examples not in this paper). The KUKA IIWA 14 robotic arm test case is a nice example because it uses rules like a*sin(x) -> 0 if a<tol, definitely a rule that you wouldn't want to generally apply, but is the kind of thing where when a scientist writes some generated code there might just be extra terms around taking a bunch of nonsense compute (this is a real-world example from a real-world user). A simple set of equalities with a cost function to minimize the floating point operations accelerates the simulation by 8.5x while changing the simulation result by ~1e-12 when the tolerance was 1e-11, that's a very useful problem-specific compilation pass that now has a 10 line syntax!
Thus we see the true application of this as not something to replace the "core" compilation passes that Julia or anything else relies on in its standard pipeline, instead we see this as a tool that scientists and application writers can use to selectively extend the compiler on the fly with domain-knowledge. This allows for compiler passes that are too specific and/or destructive to apply generically, but when applied smartly and selectively it can do some interesting things. Things like "fuse all a*b + c into fma(a,b,c)" is a rather trivial thing to write with this framework, so while a programming language may not make this the default behavior (since fma changes floating point results), this can make it easy to selectively enable such passes. That said, we still need to hook this into Julia's AbstractInterpreter so we can more easily apply it across Julia codes (I even wrote a Twitter thread saying our goal of 2022 should be to complete this https://twitter.com/ChrisRackauckas/status/14772748124604497...), there are many improvements to the e-graph saturation that could be implemented, there's many other applications to look into, etc. but I find it's a very promising start and I'm really impressed with how far Alessandro has come.