This is insanely fast, obviously a game changer over time. You should try the demo!
This seems to be using custom inference-only HW. It makes a ton of sense to use different HW for inference vs training, the requirements are different.
Nvidia, as far as I can tell, is focusing all-in on training and hoping the same HW will be used for inference.
Hi there, I work for Groq. That's right. We love graphics processors for training but for inference our language processor (LPU) is by far the fastest and lowest latency. Feel free to ask me anything.
This seems to be using custom inference-only HW. It makes a ton of sense to use different HW for inference vs training, the requirements are different.
Nvidia, as far as I can tell, is focusing all-in on training and hoping the same HW will be used for inference.
Exciting times!