Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Cerebras' benchmark is most likely under ideal conditions, but I'm not sure it's possible to test public cloud APIs under ideal conditions as it's shared infrastructure so you just don't know if a request is "ideal". I think you can only test these things across significant numbers of requests, and that still assumes that shared resource usage doesn't change much.


I'm not talking about that. I and many others here have spun up 8x or more H100 clusters and run this exact model. Zero other traffic. You won't come anywhere close to this.


  I'm not talking about that. I and many others here have spun up 8x or more H100 clusters and run this exact model. Zero other traffic. You won't come anywhere close to this.

8x H100 can also do fine tuning right? Does Cerebras offer fine tuning support?


In that case I'm misunderstanding you. Are you saying that it's "BS" that they are reaching ~1k tokens/s? If so, you may be misunderstanding what a Cerebras machine is. Also 8xH100 is still ~half the price of a single Cerebras machine, and that's even accounting for H100s being massively over priced. You've got easily twice the value in a Cerebras machine, they have nearly 1m cores on a single die.


Ha ha. He probably means ”at a batch size of 1”, i.e. not even using some amortization tricks to get better numbers.


Ah! That does make more sense!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: