DeepSeek-V3/R1 Inference System Overview

shihab · 2025-03-01T05:21:55 1740806515

Some very illuminating stats from the article:

“ Over the past 24 hours (UTC+8 02/27/2025 12:00 PM to 02/28/2025 12:00 PM), the combined peak node occupancy for V3 and R1 inference services reached 278, with an average occupancy of 226.75 nodes (each node contains 8 H800 GPUs). Assuming the leasing cost of one H800 GPU is $2 per hour, the total daily cost amounts to $87,072…

If all tokens were billed at DeepSeek-R1’s pricing (*), the total daily revenue would be $562,027, with a cost profit margin of 545%. However, our actual revenue is substantially lower for the following reasons…”

blackeyeblitzar · 2025-03-01T07:50:51 1740815451

How can profit margin be more than 100%? What do they mean by “cost profit margin”?

shihab · 2025-03-01T10:45:51 1740825951

They mean $5.45 earning for every dollar spent. So around 85% profit margin.

fspeech · 2025-03-01T21:06:53 1740863213

Gross margin in theory. They are using API pricing to project revenue but not all traffic is API. They don't charge for chat traffic so this is theoretical. It is in response to a raging debate on whether their gross margin is negative with their API pricing. Apparently not if one can saturate the traffic.

kevmo314 · 2025-03-01T04:30:35 1740803435

Interesting they chose to split prefilling into its own independent service, I hadn't heard of that technique before. I found this paper that researches why that could be beneficial: https://arxiv.org/abs/2401.09670v1

fspeech · 2025-03-01T16:25:51 1740846351

There are a lot of shared prefixes from user prompts. You can save by first looking into the cache to find the longest prefix for a prompt. Their MLA makes the KV cache particularly efficient.