Yeah, lazy evaluation/fusing operations is I think their best feature. Fixed interface BLAS has really good performance, but I mean, something as simple as DAXPBY is an extension. Selecting what functionality to super-optimize is hard for a fixed interface, and involves non-technical stuff like figuring out what subroutines people actually want.
I'm sure someone has already looked at this, but I wonder if Eigen can just somehow nab kernels directly from BLIS, haha.
I'm sure someone has already looked at this, but I wonder if Eigen can just somehow nab kernels directly from BLIS, haha.