Hacker News new | past | comments | ask | show | jobs | submit login

I wonder how ingesting more and more data will affect the size of parameters, it’s gonna continually get bigger?



I don't think that the current models are at "knowledge capacity". So far all evidence points to training on more data on the same size model gives better results.


Both increasing the amount of parameters and the amount of training tokens improves results (more precisely: lowers training loss), and costs computing power. For optimally improving loss per training computing power, model size and training tokens should be increased equally. That's the Chinchilla scaling law. (Though low loss is not always the same as good results, the data quality also matters.)

Further reading: https://dynomight.net/scaling/


An interesting corollary of this is that if you want to reduce the model size you can compensate by training for longer to achieve the same accuracy. Depending on your training:inference ratio this may be more optimal globally to reduce your total compute costs or even just reduce your frontend latency.


Yeah, though I have not seen a formula which takes the number of expected inference runs into account for calculating the optimal data/parameter balance.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: