Out of curiosity, why aren't we crowd sourcing distributed training of LLMs where anyone can join by bringing their hardware or data? Moreover find a way to incorporate this into a blockchain so there is full transparency but also add in differential privacy to protect every participant.
You can finetune with it. If you want a more generic framework you can use hivemind[1] which is what petals uses, but you'll have to create your own community for whatever model you're trying to train.
The problem here is that most people just don't have suitable hardware. Ideally, you'd want to load the entire model into a GPU and most consumer grade GPUs just don't have nowhere near enough video memory. You'd need to have something like A100 80GB GPU to be able to run a node in the potential blockchain. You can buy one of these cards for about 15k USD. Admittedly, that's not that too far off from the price of a modern bitcoin ASIC miner but still a healthy chunk of change.
And if you try to split the model across several GPUs then you'll have an issue of bandwidth as model parts would need to talk to each other (on the order of terabyte/second). At the moment, the only realistic way to contribute is just to provide feedback data for the RLHF training.
Am I being too crazy here?