Hacker News new | past | comments | ask | show | jobs | submit login

Because for anything other than CPU inference it is inferior to TensorRT and vLLM.



Are you sure about that? what are the benchmarks it fails on when setup like-for-like with GPU drivers? Even still, it can do constrained grammar and is really easy to setup with wrappers like Ollama and the Python server too. I struggled to find support for that with other inference engines - though things change fast!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: