Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
|
from
login
SEQUOIA: Exact Llama2-70B on an RTX4090 with half-second per-token latency
(
infini-ai-lab.github.io
)
131 points
by
zinccat
9 months ago
|
past
|
61 comments
Sequoia: Speculative decoding boosting LLM inference by 8-10x
(
infini-ai-lab.github.io
)
3 points
by
fgfm
11 months ago
|
past
Join us for
AI Startup School
this June 16-17 in San Francisco!
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: