This is a very charitable summary of the pros and cons to e-graph approaches to ...

DannyBee · on Jan 3, 2022

I do get that, and I understand the application to the more general numerical/ML language space. I glossed over it because I think, in the short term, as a way of exploration/filling in shortcomings, it's a useful approach.

In the medium/long term, i hope that particular use case dies a horrible death (no offense!), and stays highly limited to a few national labs or others. In general, i think you will discover over time folks will not allow it in a production system in any way unless it was worth a billion dollars or enabled something fundamentally impossible otherwise.

That's why i glossed over it - this is not the first system i've seen try to allow normal users to add compiler optimizations for good reasons (I don't mean that sarcastically). I've even seen easy to use ones, and I've seen them generate the kinds of speedups you talk about, and be worth tons. All of them eventually stopped being used as soon as possible.

These are most useful when the underlying infrastructure/field is new/being fleshed out, like it is now. When the chaos starts to settle, they are incredibly hard to maintain, etc. It also becomes much easier to make something that serves users well enough over time.

As an aside, i'll also say that outside of a few domains, or very select applications, it is also incredibly hard to get people to care about performance, and in particular, to trade perceived (not necessarily real!) reliability or standardization for performance.

I think the TL;DR is basically - i think over time you will have a lot of trouble getting anyone writ large to want to allow or use those kind of extensions in production systems, and beyond that, the inability of the infrastructure to serve the majority of users without their help is a product failure ;)

Obviously, this is just my opinion from watching these sorts of spaces for decades and seeing how they develop, I know of others who have the exact opposite opinion (IE extensibility is more important than anything), so take it for whatever it's worth :)

ChrisRackauckas · on Jan 3, 2022

It doesn't just have to be for application-specific performance though. You can also define the cost function to be to minimize the floating point error of a specific set of codes, which is what egg-herbie does. I can see throwing a custom pass on code that says "make this be the most floating point stable version it can find" as a nice quick fix for cases where application scientists without a lot of numerical analysis experience hit cases which traditionally would've required a lot more thought and care.

Another application we have in mind is to do similar linear algebra transformations to XLA. For example, sequential matrix-vector operations (A*v1 + A*v2) can be more optimally be applied as a single BLAS3 matrix-matrix call (A*[v1;v2]). Those kinds of rules can be very easily structured as an E-graph, and doing it this way would make it very inviting to mathematicians to extend and maintain the rulesets.

Both of those are cases where you may not want those passes to always be running, but they are super helpful in many numerical applications. And there's a lot more where that came from. Forward-mode AD can be implemented quite naturally this way. And there's a few more complex examples I'll hold in my back pocket until they are completed.

jpsamaroo · on Jan 3, 2022

I think the OP's point is that running these optimizations in production as-is is dangerous because future code changes in the various places could accidentally impede the optimizer's ability to apply all the transformations that users of the codebase expect.

The obvious solution is to query the optimizer to get the final transformation as actual Julia code, and replace the pre-transform code with the post-transform optimized code, and disable any further optimization (aside from very trivial transforms that aren't worth directly including). This ensures that one doesn't accidentally lose the amazing benefits of this symbolic optimization approach on a given piece of code, and that production code always keeps its performance and correctness.

throwaway81523 · on Jan 3, 2022

GHC rewrite directives sound less powerful than this, but they have been available for years, and "ordinary" people use them. Although, Haskell tends to attract compiler nerds the way Julia attracts numerics nerds.