Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How do you run this kind of a model at home? On a CPU on a machine that has about 1TB of RAM?




Wow, it's 690GB of downloaded data, so yeah, 1TB sounds about right. Not even my two Strix Halo machines paired can do this, damn.

You can do it slowly with ik_llama.cpp, lots of RAM, and one good GPU. Also regular llama.cpp, but the ik fork has some enhancements that make this sort of thing more tolerable.

Two 512GB Mac Studios connected with thunderbolt 5.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: