Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Google's advantage is mostly in historical books. Google Books has a great collection going back to the 1500s.

For modern works anyone can just add Z-Library and Anna's Archive. Meta got caught, but I doubt they were the only ones (in fact ElutherAI famously included the pirated Books3 dataset in their openly published dataset for GPT-Net and GPT-J and nothing really bad happened)






Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: