I liked this paper for attempting to make LM’s small enough to pretrain on commodity hardware. I’d like to see more work in this area. My questions are:
1. What real-world uses exist for both tiny and small models?
2. What benchmarks can we reduce in some way where tiny models can make progress on them? As in, can we design tiny models as a proxy for assessing performance of larger models?
3. Can we design tiny models in a way where we can experiment on architecture, hyperparameters, optimization algorithms, etc.? And where it tells us something useful for applying those to larger models?
4. If tiny models are useful, can we convert them to digital or analog hardware to run on cheap, low-power ASIC’s? Or just FPGA’s?
https://github.com/leonguertler/supertinylanguagemodels?tab=...
I liked this paper for attempting to make LM’s small enough to pretrain on commodity hardware. I’d like to see more work in this area. My questions are:
1. What real-world uses exist for both tiny and small models?
2. What benchmarks can we reduce in some way where tiny models can make progress on them? As in, can we design tiny models as a proxy for assessing performance of larger models?
3. Can we design tiny models in a way where we can experiment on architecture, hyperparameters, optimization algorithms, etc.? And where it tells us something useful for applying those to larger models?
4. If tiny models are useful, can we convert them to digital or analog hardware to run on cheap, low-power ASIC’s? Or just FPGA’s?