Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You don't store data nodes, or pointers. You store the keys. If you need to store data, just store them at the same index in another array.

The pointers are wholly implicit. E.g. assuming you let the index 0 be empty, the left pointer for a node k is at 2k and the right node is at 2k+1.

In terms of benefit here, the main benefit appears to be that since you always know where the children will be, you can trade bandwidth for latency and trigger pre-fetching of the next step down before you've even decided if you're going left or right.



So binary search should actually not be binary but... octary?


Yes, these are the same issues faced by databases on spinning rust disks. The solution was to figure out how many keys fit in a disk block / cache line, let's call it k, then have a k-ary tree instead of a binary tree. This is called a B-tree.

https://en.wikipedia.org/wiki/B-tree

With caching on modern CPUs, as opposed to databases on disks, there are multiple levels of caching and you may not know the size of the caches. And the caches can be shared with other work you're doing, or other processes are doing, so the effective size can be smaller. So there's some benefit to algorithms that are independent of cache size.


But note that, as per their benchmark against B-tree's, in memory the cost of the memory fetch is sufficiently small that the cost of the comparisons might wipe out the savings.

When retrieving from disk you have far more cycles to work with before less disk-read sized blocks can get close to competitive.


I wondered about this too, there's an ongoing tradeoff there between the cost of computing which branch to take vs. how much that improves locality.

But seems they did as well. See the graph at the bottom which compares against a (Eytzinger-style) B-tree, which is basically taking that approach. You do more comparisons per node to figure out which branch to take, but fetch less memory.

Given the overall performance they show is very similar, the question if you want to scale this basically becomes whether your system saturates available cores or available memory bandwidth first.

(Would have been interesting to see comparisons of a quaternary and octonary version as well, though as their B-tree test seems to rely on 16 branches, and maybe the sweet spot is somewhere in between)


That’s what I meant re keys, but ya nice point about pointers. Forgetting details of my data structures classes. updated my post (used to count 1 bit per node for pointers)


How does it help pre-fetching?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: