Hacker News new | past | comments | ask | show | jobs | submit | from login
SEQUOIA: Exact Llama2-70B on an RTX4090 with half-second per-token latency (infini-ai-lab.github.io)
131 points by zinccat 9 months ago | past | 61 comments
Sequoia: Speculative decoding boosting LLM inference by 8-10x (infini-ai-lab.github.io)
3 points by fgfm 11 months ago | past

Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: