Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nothing in particular. Setting up a scientific python+pytorch stack is difficult if you're unfamiliar with the python packaging ecosystem.

If you're not on a "happy path" of "ubuntu, nvidia, and anaconda," then lots of things could go wrong if you're configuring from scratch. If you want to run these models efficiently, hardware acceleration is a must, but managing the intersection between {GPU, operating system, architecture, Python version, Python virtualenv location} is tricky.

That's even before you deal with hardware-specific implementation details!

- Running NVidia? Double-check your CUDA library version, kernel module version, (optional) CuDNN, and pytorch version

- Running AMD? Double-check your ROCm library version, AMD drivers, and make sure that you use Pytorch provided by AMD with ROCm support

- On Apple machines? Double-check that your M1 hardware actually has proper hardware support, then download and install a custom Pytorch distribution linked with M1 support, and make sure that the numpy library version has been properly linked with Accelerate.framework or else your BLAS calls run on the CPU rather than the undocumented AMX coprocessor. If you want to run on the ANE, you'll additionally need a working xcode toolchain and a version of the CoreML model compiler that can read your serialized pytorch model files properly.

I think the pain of getting things working makes it easier to just throw up one's hands and pay someone else to run your model for you.



It is wild to me how hard it is to run these things GPU-accelerated on an Apple M1/M2.

The hardware support should be amazing for this, given that the CPU and GPU share the same RAM.

I mostly end up running the CPU versions and grumbling about how slow they are.


> The hardware support should be amazing for this

I mean, caveat emptor; CoreML has been barebones for years, and Apple isn't exactly notorious for huge commitment to third party APIs. The writing was on the wall with how Metal was rolled out and how fast OpenCL got dropped, it honestly doesn't surprise me at all at this point. Even the current Apple Silicon support in llama.cpp is fudged with NEON and Metal Shaders instead of Apple's "Neural Engine".


GPU acceleration is pretty easy with llama.cpp. You just run make with an extra flag and then an argument or two at runtime.


Adrien Brault on Twitter gave me this recipe, which worked perfectly: https://gist.github.com/adrienbrault/b76631c56c736def9bc1bc2...


Therein lies the problem.

I want to write instructions for people to use my software that don't expect them to know how to run make, or even to have a C compiler installed.


Then you more or less need a GUI like OpenAI built with ChatGPT so you control the whole environment. Even setting up LLM in Homebrew required me to do the whole install twice because of some arcane error.


I think I can fix that by shipping a bottle release - "brew install simonw/llm/llm" currently attempts to compile Pydantic's Rust extensions, which means it runs incredibly slowly.

I built a bottle for it which installs much faster, but you currently have to download the file from https://static.simonwillison.net/static/2023/llm--0.5.arm64_... - I've not yet figured out how to get that to install when the user runs "brew install simonw/llm/llm" - issue here: https://github.com/simonw/llm/issues/102




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: