Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd guess the function operates of 8 bit values judging from the name. If the previous implementation was scalar, a double-pumped AVX512 implementation can process 128 elements at a time, making the 100x speedup plausible.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: