Parent is (I assume) talking about the entire budget to get to DeepSpeek V3, not the cost of the final training run.
This includes salary for ~130 ML people + rest of the staff, company is 2 years old.
They have trained DeepSpeek V1, V2, R1, R1-Zero before finally training V3, as well as a bunch of other less known models.
The final run of V3 is ~6M$ (at least officially...[1]), but that does not factor the cost of all the other failed runs, ablations etc. that always happen when developing a new model.
You also can't get clusters of this size with a 3 weeks commitment just to do your training and then stop paying for it, there is always a multi-month (if not 1 year) commitment because of demand/supply. Or, if it's a private cluster they own it's already a $200M-300M+ investment just for the advertised 2000 GPUs for that run.
I don't know if it really is $1B, but it certainly isn't below $100M.
[1] I personally believe they used more GPUs than stated, but simply can't be forthcoming about this for obvious reason. I have of course not proof of that, my belief is just based on scaling laws we have seen so far + where the incentives are for stating the # of GPUs. But even if the 2k GPUs figure is accurate, it's still $100M+
> that could be like a side-project for a company like that, whose blood and sweat is literally money.
From the mouth of Liang Wenfeng, co-founder of both High Flyer and DeepSeek, 18 months ago:
"Our large-model project is unrelated to our quant and financial activities. We’ve established an independent company called DeepSeek, to focus on this."
> This kinda does support the 'DeepSeek is the side project of a bunch of quants' angle
Can we stop with this nonsense ?
The list of author of the paper is public, you can just go look it up. There are ~130 people on the ML team, they have regular ML background just like you would find at any other large ML labs.
Their infra cost multiple millions of dollar per month to run, and the salary of such a big team is somewhere in the $20-50M per year (not very au fait of the market rate in china hence the spread).
This is not a sideproject.
Edit: Apparently my comment is confusing some people. Am not arguing that ML people are good at security. Just that DS is not the side project of a bunch of quant bros.
A bunch of ML researchers who were initially hired to do quant work published their first ever user facing project.
So maybe not a side project, but if you have ever worked with ML researchers before, lack of engineering/security chops shouldn't be that surprising to you.
> A bunch of ML researchers who were initially hired to do quant work
Very interesting! I'm sure you have a source for this claim?
This myth of DS being a side project literally started from one tweet.
DeepSeek the company is funded by a company whose main business is being a hedge fund, but DeepSeek itself from day 1 has been all about building LLM to reach AGI, completely independent.
This is like saying SpaceX is the side-project of a few caremaking bros, just because Elon funded and manages both. They are unrelated.
Again, you can easily google the name of the authors and look at their background, you will find people with PhD in LLM/multimodal models, internships at Microsoft Research etc. No trace of background on quant or time series prediction or any of that.
From the mouth of the CEO himself 2 years ago: "Our large-model project is unrelated to our quant and financial activities. We’ve established an independent company called DeepSeek, to focus on this." [0]
It's really interesting to see how after 10 years debating the mythical 10x engineer, we have now overnight created the mythical 100x Chinese quant bro researcher, that can do 50x better models than the best U.S. people, after 6pm while working on his side project.
TDLR Highflyer started very much as exclusive ML/AI focused quant investment firm, with a lot of compute for finance AI and mining. Then CCP cracked down on mining... then finance, so Liang probably decided to pivot to LLM/AGI, which likely started as side project, but probably not anymore now the DeepSeek has taken off and Liang just met with PRC premiere a few days ago. DeepSeek being independent company doesn't mean DeepSeek isn't Liang's side project using compute bought with hedge fund money that is primarily used for hedgefund work, cushioned/allowed to get by with low margins by hedgefund profits.
That's a fair distinction. IMO should still be categorized as side project in the sense that it's Liang's pet project, the same way Jeff Bezos spend $$$ on his forever clock with seperate org but ultimately with Amazon resources. DeepSeek / Liang fixating on AGI and not profit making or loss-making since hardware / capex deprecation is likely eaten by High Flyer / quant side. No reason to believe DeepSeek spent 100ms to build out another compute chain from High Flyer. Myth that seasoned finance quants using 20% time to crush US researchers is false, but reality/narrative that a bunch of fresh out of school GenZ kids from tier1 PRC universities destroying US researchers is kind of just as embarassing.
The carmaking bro predates SpaceX. He had a BMW in college and got a supercar in 1997. While he wasn’t a carmaker yet he got started with cars earlier.
First ever? Their math, coding, and other models have been making a splash since 2023.
The mythologizing around deepseek is just absurd.
"Deepseek is the tale of one lowly hedgefund manager overcoming the wicked American AI devils". Every day I hear variations of this, and the vast majority of it is based entirely in "vibes" emanating from some unknown place.
What I find amusing is that this closely mirrors the breakout moment OpenAI had with ChatGPT. They had been releasing models for quite some time before slapping the chatbot interface on it, and then it blew up within a few days.
It's fascinating that a couple of years and a few competitors in, the DeepSeek moment parallels it so closely.
Models and security are very different uses of our synapses. Publishing any number of models is no proof of anything beyond models. Talented mathematicians and programmers though they may be.
OP means to say public API and app being a side project, which likely it is, the skills required to do ML have little overlap to skills required to run large complex workloads securely and at scale for public facing app with presumably millions of users.
The latter role also typically requires experience not just knowledge to do well which is why experiences SREs have very good salaries.
> First of all, training off of data generated by another AI is generally a bad idea because you'll end up with a strictly less accurate model (usually).
That is not true at all.
We have known how to solve this for at least 2 years now.
All the latest state of the art models depend heavily on training on synthetic data.
You want to bet?
The panic around deepseek is getting completely disconnected from reality.
Don’t get me wrong what DS did is great, but anyone thinking this reshape the fundamental trend of scaling laws and make compute irrelevant is dead wrong.
I’m sure OpenAI doesn’t really enjoy the PR right now, but guess what OpenAI/Google/Meta/Anthropic can do if you give them a recipe for 11x more efficient training ? They can scale it to their 100k GPUs clusters and still blow everything.
This will be textbook Jevons paradox.
Compute is still king and OpenAI has worked on their training platform longer than anyone.
Of course as soon as the next best model is released, we can train on its output and catch up at a fraction of the cost, and thus the infinite bunny hopping will continue.
> The panic around deepseek is getting completely disconnected from reality.
This entire hype cycle has long been completely disconnected from reality. I've watched a lot of hype waves, and I've never seen one that oscillates so wildly.
I think you're right that OpenAI isn't as hurt by DeepSeek as the mass panic would lead one to believe, but it's also true that DeepSeek exposes how blown out of proportion the initial hype waves were and how inflated the valuations are for this tech.
Meta has been demonstrating for a while that models are a commodity, not a product you can build a business on. DeepSeek proves that conclusively. OpenAI isn't finished, but they need to continue down the path they've already started and give up the idea that "getting to AGI" is a business model that doesn't require them to think about product.
In a sense it doesn't, in that if DeepSeek can do this, making OpenAI-type capabilities available for Llama-type infrastructure costs, then if you apply OpenAI scale infrastructure again to a much more efficient training/evaluation system, everything multiplies back up. I think that's where they'll have to head: using their infrastructure moat (such as it is) to apply these efficiency learnings to allow much more capable models at the top end. Yes, they can't sleep-walk into it, but I don't think that was ever the game.
> The panic around deepseek is getting completely disconnected from reality.
Couldn’t agree more! Nobody here read the manual. The last paragraph of DeepSeek’s R1 paper:
> Software Engineering Tasks: Due to the long evaluation times, which impact the efficiency of the RL process, large-scale RL has not been applied extensively in software engineering tasks. As a result, DeepSeek-R1 has not demonstrated a huge improvement over DeepSeek-V3 on software engineering benchmarks. Future versions will address this by implementing rejection sampling on software engineering data or incorporating asynchronous evaluations during the RL process to improve efficiency.
Just based on my evaluations so far, R1 is not even an improvement on V3 in terms of real world coding problems because it gets stuck in stupid reasoning loops like whether “write C++ code to …” means it can use a C library or has to find a C++ wrapper which doesn’t exist.
OpenAI issue might be that it is extremely inefficient with money (high salaries, high compute costs, high expenses, etc..). This is fine when you have an absolute monopoly as investors will throw money your way (open ai is burning cash) but once an alternative is clear, you can no longer do that.
OpenAI doesn't have an advantage in compute more than Google, Microsoft or someone with a few billions of $$.
oh wow. I have been using kagi premium for months, and never noticed, that their AI assistant now has all the good AIs too. I was using kagi exclusively for search, and perplexity for ai stuff. I guess I can cut down on my subscriptions too. Thanks for your hint. (Also I noticed that kagi has a pwa for their ai assistent, which is also cool)
Computing is not king, DeepSeek just demonstrated otherwise. And yes, OpenAI will have to reinvent itself to copy DS, but this means they'll have to throw away a lot of their investment in existing tech. They might recover but it is not a minor hiccup as you suggest.
I just don't see how this is true. OpenAI has a massive cash & hardware pile -- they'll adapt and learn from what DeepSeek has done and be in a position to build and train 10x-50x-100x (or however) faster and better. They are getting a wake-up call for sure but I don't think much is going to be thrown away.
Define « too much ».
I don’t disagree with you that we are currently only looking for life like our own, but « too much » makes it sound like this is some form of short sightedness, whereas it’s just that trying to find something we can’t define is basically impossible.
We can define life as it is on earth, and we have proof it’s possible.
It might be an outlier form but it’s the only one we can effectively look for.
How would you implement that in startup world for example?
It's very common for startups to be valued at ~20M$ right out of the gate in seed stage, not because the company is worth $20M, but because at $20M valuation it allows the VCs to invest say $4M and only take 20%, no one want the VCs to take more (not even the VCs themselves) because otherwise it would mean the founders are left with too little equity too soon and probably won't care about their business anymore.
Now, as one of the founder, maybe you own ~40% of that business, so now your paper net worth is $8M, and just made $8M of unrealized gains in that year, how are you going to pay that?
There is no way you will ever find someone to buy $1M of your share at the price of that round, you probably wouldn't find anyone willing to buy your entire paper $8M for $1M, because again, the company isn't worth $20M yet.
This is true until pretty late in a VC backed company, most round aren't priced based on how a realistic buyer would value the company, they are priced based on complex dynamics. Even a large number of unicorn startups founders in the Series C/D stages would have paper wealth of potentially 500M range, but absolutely no way to find 50M.
So, you effectively have no way to pay that tax.
This system actually already pseudo-exist in Canada in specific conditions: If you stop being a tax resident of the country, all your assets are considered realized the year you leave and you must pay taxes on them. Which is effectively impossible for most startup founders, because again, your stock isn't actually liquid. This means you can't stop being a tax resident of Canada until your companies either dies or you exit somehow. To be clear you can't easily just choose to remain tax resident of Canada while living abroad, Canada gets to decide, to maximize your chance you must prove that you still have ties, so e.g. you have to keep a home, you have to keep your bank accounts opened there, you must visit often enough etc.
Canada revenue agency offers one alternative: You leave the country but leave your stock in their keep, on the day you actually realize the gains, they will take what they were owed, which sounds great, except if the company fails, or you realize gains at a lower valuation, they still consider you owe them what was computed the year you left, not the day you exit, so there is a real risk of being in debt for the rest of your life.
Minimum thresholds, and exceptions for less liquid assets (private equity) - ideally, again, coupled with thresholds.
The same way we have exceptions like CA Prop 13 for increasing property taxes.
These problems aren't impossible to solve. It's wild how people will find any tiny excuse to give up on making a change to try and make tax code more fair. If there are edge cases that a blanked change to the code makes worse, that's NOT a reason to just throw our hands up and say "whelp, can't make changes" - it just means we need to add a bit more nuance.
You set a minimum threshold to trigger it, and you set certain realistic exemptions for things that would benefit society, including giving a VC time to mature.
> Of course, as your company continues to appreciate, you will be forced to continue reducing your ownership stake
Why?
In an hypothetical world where getting a loan on an asset is impossible (or taxed the same as realizing the gains), you still don't get taxed on unrealized gains.
You can leave your stock alone and you aren't forced to sell anything.
Of course if you decide that now that you are worth a billion you must live like a billionaire, then yes, you will have to sell stock, reduce your influence in the company and pay tax on the gains.
I don't see any problem with this? It offers a way for the stock owner to choose if they want to use the stock as power (don't touch it) or as cash (sell it), only taxing you when you opt for the later.
edit: I realised I might have misread your post as defending the system allowing one to use unrealized gains to back a loan, hence enabling the buy/borrow/die loophole, when you are in fact defending against taxing unrealized gains. To me the obvious fix is to prevent those loans as discussed above: force people to choose how they want to use their assets, if they choose to use them to live like kings then they must pay tax.
The fix in your edit isn't an obviously workable fix though. When talking about the rich, it's best to talk about private corporations -- because that's really how the operate.
Firstly, do you want to prevent corporations from taking loans against their assets? Preventing that seems like it would be quite detrimental.
Secondly, how do you differentiate legitimate corporate expenses from personal expenses? Is a billionaire having one of their corporations rent a yacht from another of their corporations for a business meeting with another CEO who just happens to also be their friend a legitimate business expense or a personal expense? What if the yacht rental company rented it to the CEO's company instead?
Lots of company are doing this exact business, they all have insane churn rate.
Biggest use case seems to be for people who want to prototype something quickly, they don’t yet want to bother with managing the infra and don’t mind the extra cost since this will be running at small scale.
But if the experiment is successful, then the customer churns as it (most of the time) makes little sense to scale ML on serverless.
And if the experiment is not successful, they are obviously also going to churn.
This includes salary for ~130 ML people + rest of the staff, company is 2 years old. They have trained DeepSpeek V1, V2, R1, R1-Zero before finally training V3, as well as a bunch of other less known models.
The final run of V3 is ~6M$ (at least officially...[1]), but that does not factor the cost of all the other failed runs, ablations etc. that always happen when developing a new model.
You also can't get clusters of this size with a 3 weeks commitment just to do your training and then stop paying for it, there is always a multi-month (if not 1 year) commitment because of demand/supply. Or, if it's a private cluster they own it's already a $200M-300M+ investment just for the advertised 2000 GPUs for that run.
I don't know if it really is $1B, but it certainly isn't below $100M.
[1] I personally believe they used more GPUs than stated, but simply can't be forthcoming about this for obvious reason. I have of course not proof of that, my belief is just based on scaling laws we have seen so far + where the incentives are for stating the # of GPUs. But even if the 2k GPUs figure is accurate, it's still $100M+
reply