For those getting started, the easiest one click installer I've used is Nomic.ai's gpt4all: https://gpt4all.io/
This runs with a simple GUI on Windows/Mac/Linux, leverages a fork of llama.cpp on the backend and supports GPU acceleration, and LLaMA, Falcon, MPT, and GPT-J models. It also has API/CLI bindings.
I just saw a slick new tool https://ollama.ai/ that will let you install a llama2-7b with a single `ollama run llama2` command that has a very simple 1-click installer for Apple Silicon Mac (but need to build from source for anything else atm). It looks like it only supports llamas OOTB but it also seems to use llama.cpp (via Go adapter) on the backend - it seemed to be CPU-only on my MBA, but I didn't poke too much and it's brand new, so we'll see.
I've also been porting my own notes into a single location that tracks models, evals, and has guides focused on local models: https://llm-tracker.info/
This runs with a simple GUI on Windows/Mac/Linux, leverages a fork of llama.cpp on the backend and supports GPU acceleration, and LLaMA, Falcon, MPT, and GPT-J models. It also has API/CLI bindings.
I just saw a slick new tool https://ollama.ai/ that will let you install a llama2-7b with a single `ollama run llama2` command that has a very simple 1-click installer for Apple Silicon Mac (but need to build from source for anything else atm). It looks like it only supports llamas OOTB but it also seems to use llama.cpp (via Go adapter) on the backend - it seemed to be CPU-only on my MBA, but I didn't poke too much and it's brand new, so we'll see.
For anyone on HN, they should probably be looking at https://github.com/ggerganov/llama.cpp and https://github.com/ggerganov/ggml directly. If you have a high-end Nvidia consumer card (3090/4090) I'd highly recommend looking into https://github.com/turboderp/exllama
For those generally confused, the r/LocalLLaMA wiki is a good place to start: https://www.reddit.com/r/LocalLLaMA/wiki/guide/
I've also been porting my own notes into a single location that tracks models, evals, and has guides focused on local models: https://llm-tracker.info/