Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hence the split. It's still likely to beat the crap out of sorting and it might not be needed, depending on the data. I do wonder what the dataset is that makes for such tiny, short lines.


Given the numbers, the need to dedupe, the poster's strong motivation and relative inexperience...

I'd guess it's a gigantic spam list.


Hah, self-duh. Total failure of technical cynicism on my part.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: