Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They might be willing to do things like crawl libgen which google possibly isn't, giving them an advantage. They might be more skilled at generating useful synthetic data which is a bit of an art and subject to taste, which other competitors might not be as good at.


> They might be willing to do things like crawl libgen which google possibly isn't

Are you implying big companies don't crawl libgen? Or google specifically? I would be very surprised if OpenAI (MS) didn't crawl libgen.


OpenAI probably does. Not sure about google, possibly not


Google has a ton of scanned books and magazines from libraries etc, on top of their own web crawls. If they don't have the equivalent of libgen tucked away something's gone wrong.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: