Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Or you can just use Eigen via CUDA:

The link you’ve posted doesn’t mean you can do large dense matmul in CUDA right away, it’s just that you can use fixed sized vector/matrix operations inside kernels (which might be useful if you’re writing graphics code and need some vec3/mat3/quats.) You still have to write your own customized kernel for a large dynamic-sized gemm computation (with tiling and shared memory and all that jazz), and at that point it’ll be best to just use CUBLAS.



As you have read, the document states:

> By default, when Eigen's headers are included within a .cu file compiled by nvcc most Eigen's functions and methods are prefixed by the device host keywords making them callable from both host and device code.

Eigen casually overrides "*" to do any kind multiplication, hence I guess it'll also carry the GeMM functions alongside to the kernel during compilation, but this needs to be tested.

Considering the speed I got from running Eigen on CPU, I still need to find larger problems to make that effort worthwhile, however.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: