Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’d be surprised if SIMD + branch to switch algorithms beats branchless binary search for data in cache. Were you comparing with that? Not saying it isn’t possible, god knows CPUs surprise in all kinds of wonderful ways. It’s not the outcome I would have guessed though.

4x AVX2 vectors of 32bit integers is 32 integers. So log2(32) = 5, 5 cmov instructions or a largely unpredictable branch + simd instructions and either more branches or more cmovs to get the result.

It's possible if you were testing on similar sized arrays in a loop that the unpredictable branch becomes 100% predictable, which I think would give the result you saw.



I haven't tested this either, but why would the branch be "largely unpredictable"? It's always going to evaluate to false except once, and the one time it evaluates to true the rest of the search will be finished within the branch.


I’m referring to exactly that misprediction when the branch triggers. It’s only predictable if the input arrays always has the same length log2, and it’s called in a loop so the history is there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: