Are you sure about that?
what are the benchmarks it fails on when setup like-for-like with GPU drivers?
Even still, it can do constrained grammar and is really easy to setup with wrappers like Ollama and the Python server too. I struggled to find support for that with other inference engines - though things change fast!