Not attainable to working class though. *can* is doing a lot of heavy lifting he...

hmottestad · 2025-01-26T20:43:01 1737924181

Not sure what you mean. Cutting edge computing has never been cheap. And a Mac Studio is definitely within the budget of a software developer in Norway. Not going to feel like a cheap investment, but definitely something that would be doable. Unlike a cluster of H100 GPUs, which would cost as much as a small apartment in Oslo.

And you can easily get a dev job in Norway without having to run an LLM locally on your computer.

manmal · 2025-01-26T22:03:14 1737928994

The money would be better invested in a 2-4 3090 x86 build, than in a Mac Studio. While the Macs have a fantastic performance-per-watt ratio, and have decent memory support (both bus width and memory size), they are not great at compute power. A multi RTX 3090 build totally smokes a Mac at the same price point, at inference speed.

hmottestad · 2025-01-27T11:19:08 1737976748

Memory requirement for the 7B model with full context is 120GB, so you would need 5 3090 GPUs, not 2-4. Do you know if you can get a motherboard with space for 5 GPUs and a power supply to match?

I bet that 5 3090s will smoke a Mac Studio. Can't find anyone in Norway with any in stock though. Or any 4090s with 24GB of memory.

You can get a nVidia RTX 5000 with 32GB of memory, there are two webshops that have those in stock. You'll need to wait though, because it looks like there might be one or maybe two in stock in total. And they are 63 000 NOK, and you need 4 of them. At that price you can buy two Mac Studios though.

I see people selling 3090s with 24GB secondhand for around 10 000 NOK each, but those have been running day in and day our for 3 years and don't come with a warranty.

manmal · 2025-01-28T06:17:06 1738045026

If you search on r/localllama, there are people who have improvised builds with eg 8x GPUs. Takes multiple power supplies and server mainboards. And some let the GPUs sit openly on wooden racks - not sure that’s good for longevity?

BTW a Mac wouldn’t be able to run a model with 120GB requirements, 8GB for the rest is likely too tight a fit.

hmottestad · 2025-01-28T20:48:35 1738097315

Mac Studio has up to 196 GB of memory.

sgt · 2025-01-26T22:01:55 1737928915

Agreed - it's probably not unreasonable. So are the M4 Macs becoming the de-facto solution to running an LLM locally? Due to the insane 800 GB/sec internal bandwidth of Apple Silicon at its best?

simonw · 2025-01-26T22:10:26 1737929426

The advantage the Macs have is that they can share RAM between GPU and CPU, and GPU-accessible RAM is everything when you want to run a decent sized LLM.

The problem is that most ML models are released for NVIDIA CUDA. Getting them to work on macOS requires translating them, usually to either GGUF (the llama.cpp format) or MLX (using Apple's own MLX array framework).

As such, as a Mac user I remain envious of people with NVIDIA/CUDA rigs with decent amounts of VRAM.

The NVIDIA "Digits" product may change things when it ships: https://www.theverge.com/2025/1/6/24337530/nvidia-ces-digits... - it may become the new cheapest convenient way to get 128GB of GPU-accessible RAM for running models.

manmal · 2025-01-26T22:03:45 1737929025

No they are lacking compute power to be great at inference.

simonw · 2025-01-26T22:07:12 1737929232

Can you back that up?

manmal · 2025-01-27T07:53:10 1737964390

One 3090 seems to be equivalent to one M3 Max at inference: https://www.reddit.com/r/LocalLLaMA/s/BaoKxHj8ww

There are many such threads on Reddit. M4 Max is incrementally faster, maybe 20%. Even if you factor in electricity costs, a 2x 3090 setup is IMO the sweet spot, cost/benefit wise.

And it’s maybe a zany line of argumentation, but 2x 3090 use 10x the power of an M4 Max. While the M4 is maybe the most efficient setup out there, it’s not nearly 10x as efficient. That’s IMO where the lack of compute power comes from.

sgt · 2025-01-27T12:37:33 1737981453

What is the GPU memory on that 3090?

manmal · 2025-01-27T13:43:34 1737985414

24GB VRAM. Using multiple ones scales well because models can be split by layers, and run in a pipelined fashion.

varispeed · 2025-01-27T21:49:39 1738014579

I am talking about the times where you were only limited by your imagination and skills. All you needed was a laptop and few hundred bucks for servers. Now, to compete, you would need magnitudes more cash. You can still do some things, but you are at a mercy of AI providers that they can cut you off on a whim.

cma · 2025-01-26T22:23:00 1737930180

Not much more than something like a used jetski, but possibly depreciates even faster.

sbarre · 2025-01-26T20:28:35 1737923315

I mean... when has this not been the case?

Technology has never been class-agnostic or universally accessible.

Even saying that, I would argue that there is more, not less, technology that is accessible to more people today than there ever has been.