Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can ensure a model trains on transformative not derivative synthetic texts, for example, by asking for summary, or turning it into QA pairs, or doing contrastive synthesis across multiple copyrighted works. This will ensure the resulting model will never regurgitate the training set because it has not seen it. This approach only takes abstract ideas from copyrighted sources, protecting their specific expression.

If abstract ideas were protectable what would stop a LLM from learning not from the original source but from social commentary and follow up works? We can't ask people not to reproduce ideas they read about. But on the other hand, protecting abstractions would kneecap creativity both in humans and AI.



That's an interesting argument, which makes the case for "it's what you make it do, not what it can do, which constitutes a violation" a little stronger IMO.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: