Nothing in particular. Setting up a scientific python+pytorch stack is difficult if you're unfamiliar with the python packaging ecosystem.
If you're not on a "happy path" of "ubuntu, nvidia, and anaconda," then lots of things could go wrong if you're configuring from scratch. If you want to run these models efficiently, hardware acceleration is a must, but managing the intersection between {GPU, operating system, architecture, Python version, Python virtualenv location} is tricky.
That's even before you deal with hardware-specific implementation details!
- Running NVidia? Double-check your CUDA library version, kernel module version, (optional) CuDNN, and pytorch version
- Running AMD? Double-check your ROCm library version, AMD drivers, and make sure that you use Pytorch provided by AMD with ROCm support
- On Apple machines? Double-check that your M1 hardware actually has proper hardware support, then download and install a custom Pytorch distribution linked with M1 support, and make sure that the numpy library version has been properly linked with Accelerate.framework or else your BLAS calls run on the CPU rather than the undocumented AMX coprocessor. If you want to run on the ANE, you'll additionally need a working xcode toolchain and a version of the CoreML model compiler that can read your serialized pytorch model files properly.
I think the pain of getting things working makes it easier to just throw up one's hands and pay someone else to run your model for you.
I mean, caveat emptor; CoreML has been barebones for years, and Apple isn't exactly notorious for huge commitment to third party APIs. The writing was on the wall with how Metal was rolled out and how fast OpenCL got dropped, it honestly doesn't surprise me at all at this point. Even the current Apple Silicon support in llama.cpp is fudged with NEON and Metal Shaders instead of Apple's "Neural Engine".
Then you more or less need a GUI like OpenAI built with ChatGPT so you control the whole environment. Even setting up LLM in Homebrew required me to do the whole install twice because of some arcane error.
I think I can fix that by shipping a bottle release - "brew install simonw/llm/llm" currently attempts to compile Pydantic's Rust extensions, which means it runs incredibly slowly.
If you're not on a "happy path" of "ubuntu, nvidia, and anaconda," then lots of things could go wrong if you're configuring from scratch. If you want to run these models efficiently, hardware acceleration is a must, but managing the intersection between {GPU, operating system, architecture, Python version, Python virtualenv location} is tricky.
That's even before you deal with hardware-specific implementation details!
- Running NVidia? Double-check your CUDA library version, kernel module version, (optional) CuDNN, and pytorch version
- Running AMD? Double-check your ROCm library version, AMD drivers, and make sure that you use Pytorch provided by AMD with ROCm support
- On Apple machines? Double-check that your M1 hardware actually has proper hardware support, then download and install a custom Pytorch distribution linked with M1 support, and make sure that the numpy library version has been properly linked with Accelerate.framework or else your BLAS calls run on the CPU rather than the undocumented AMX coprocessor. If you want to run on the ANE, you'll additionally need a working xcode toolchain and a version of the CoreML model compiler that can read your serialized pytorch model files properly.
I think the pain of getting things working makes it easier to just throw up one's hands and pay someone else to run your model for you.