Hacker News new | past | comments | ask | show | jobs | submit login
DeepSeek-V3/R1 Inference System Overview (github.com/deepseek-ai)
27 points by meetpateltech 14 days ago | hide | past | favorite | 6 comments



Some very illuminating stats from the article:

“ Over the past 24 hours (UTC+8 02/27/2025 12:00 PM to 02/28/2025 12:00 PM), the combined peak node occupancy for V3 and R1 inference services reached 278, with an average occupancy of 226.75 nodes (each node contains 8 H800 GPUs). Assuming the leasing cost of one H800 GPU is $2 per hour, the total daily cost amounts to $87,072…

If all tokens were billed at DeepSeek-R1’s pricing (*), the total daily revenue would be $562,027, with a cost profit margin of 545%. However, our actual revenue is substantially lower for the following reasons…”


How can profit margin be more than 100%? What do they mean by “cost profit margin”?

They mean $5.45 earning for every dollar spent. So around 85% profit margin.

Gross margin in theory. They are using API pricing to project revenue but not all traffic is API. They don't charge for chat traffic so this is theoretical. It is in response to a raging debate on whether their gross margin is negative with their API pricing. Apparently not if one can saturate the traffic.

Interesting they chose to split prefilling into its own independent service, I hadn't heard of that technique before. I found this paper that researches why that could be beneficial: https://arxiv.org/abs/2401.09670v1


There are a lot of shared prefixes from user prompts. You can save by first looking into the cache to find the longest prefix for a prompt. Their MLA makes the KV cache particularly efficient.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: