Not attainable to working class though. can is doing a lot of heavy lifting here. Seems like after a brief period where technology was essentially class agnostic, now only the wealthy can enjoy being part of development and everyone else can just be a consumer.
Not sure what you mean. Cutting edge computing has never been cheap. And a Mac Studio is definitely within the budget of a software developer in Norway. Not going to feel like a cheap investment, but definitely something that would be doable. Unlike a cluster of H100 GPUs, which would cost as much as a small apartment in Oslo.
And you can easily get a dev job in Norway without having to run an LLM locally on your computer.
The money would be better invested in a 2-4 3090 x86 build, than in a Mac Studio. While the Macs have a fantastic performance-per-watt ratio, and have decent memory support (both bus width and memory size), they are not great at compute power. A multi RTX 3090 build totally smokes a Mac at the same price point, at inference speed.
Memory requirement for the 7B model with full context is 120GB, so you would need 5 3090 GPUs, not 2-4. Do you know if you can get a motherboard with space for 5 GPUs and a power supply to match?
I bet that 5 3090s will smoke a Mac Studio. Can't find anyone in Norway with any in stock though. Or any 4090s with 24GB of memory.
You can get a nVidia RTX 5000 with 32GB of memory, there are two webshops that have those in stock. You'll need to wait though, because it looks like there might be one or maybe two in stock in total. And they are 63 000 NOK, and you need 4 of them. At that price you can buy two Mac Studios though.
I see people selling 3090s with 24GB secondhand for around 10 000 NOK each, but those have been running day in and day our for 3 years and don't come with a warranty.
If you search on r/localllama, there are people who have improvised builds with eg 8x GPUs. Takes multiple power supplies and server mainboards. And some let the GPUs sit openly on wooden racks - not sure that’s good for longevity?
BTW a Mac wouldn’t be able to run a model with 120GB requirements, 8GB for the rest is likely too tight a fit.
Agreed - it's probably not unreasonable.
So are the M4 Macs becoming the de-facto solution to running an LLM locally? Due to the insane 800 GB/sec internal bandwidth of Apple Silicon at its best?
The advantage the Macs have is that they can share RAM between GPU and CPU, and GPU-accessible RAM is everything when you want to run a decent sized LLM.
The problem is that most ML models are released for NVIDIA CUDA. Getting them to work on macOS requires translating them, usually to either GGUF (the llama.cpp format) or MLX (using Apple's own MLX array framework).
As such, as a Mac user I remain envious of people with NVIDIA/CUDA rigs with decent amounts of VRAM.
There are many such threads on Reddit.
M4 Max is incrementally faster, maybe 20%.
Even if you factor in electricity costs, a 2x 3090 setup is IMO the sweet spot, cost/benefit wise.
And it’s maybe a zany line of argumentation, but 2x 3090 use 10x the power of an M4 Max. While the M4 is maybe the most efficient setup out there, it’s not nearly 10x as efficient. That’s IMO where the lack of compute power comes from.
I am talking about the times where you were only limited by your imagination and skills. All you needed was a laptop and few hundred bucks for servers. Now, to compete, you would need magnitudes more cash. You can still do some things, but you are at a mercy of AI providers that they can cut you off on a whim.