Hacker Newsnew | past | comments | ask | show | jobs | submitlogin



A lot is written about the algorithms and search results from Google but I haven’t seen much on crawling and indexing. I imagine there are multiple teams in Google who help maintain the integrity and completeness of their indexes but would love to know more about it considering it might be the largest single repository of knowledge.


I believe email is significantly larger than the web. Wouldn't that be your largest repo of knowledge, in aggregate?


Yes, I agree. I just meant the Google index is probably the largest centralized collection of knowledge. Maybe not though, just a guess.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: