JAX/ XLA seem quite popular, especially since you can use them in Colab on TPUs. IIRC, there are people who have managed get jax compiled with ROCm support.
Tensorflow lets you deploy to quite a few backends.
oneAPI support in GPUArrays.jl seems to be coming along.