You don't store data nodes, or pointers. You store the keys. If you need to stor...

WithinReason · on April 28, 2023

So binary search should actually not be binary but... octary?

martincmartin · on April 28, 2023

Yes, these are the same issues faced by databases on spinning rust disks. The solution was to figure out how many keys fit in a disk block / cache line, let's call it k, then have a k-ary tree instead of a binary tree. This is called a B-tree.

https://en.wikipedia.org/wiki/B-tree

With caching on modern CPUs, as opposed to databases on disks, there are multiple levels of caching and you may not know the size of the caches. And the caches can be shared with other work you're doing, or other processes are doing, so the effective size can be smaller. So there's some benefit to algorithms that are independent of cache size.

vidarh · on April 28, 2023

But note that, as per their benchmark against B-tree's, in memory the cost of the memory fetch is sufficiently small that the cost of the comparisons might wipe out the savings.

When retrieving from disk you have far more cycles to work with before less disk-read sized blocks can get close to competitive.

vidarh · on April 28, 2023

I wondered about this too, there's an ongoing tradeoff there between the cost of computing which branch to take vs. how much that improves locality.

But seems they did as well. See the graph at the bottom which compares against a (Eytzinger-style) B-tree, which is basically taking that approach. You do more comparisons per node to figure out which branch to take, but fetch less memory.

Given the overall performance they show is very similar, the question if you want to scale this basically becomes whether your system saturates available cores or available memory bandwidth first.

(Would have been interesting to see comparisons of a quaternary and octonary version as well, though as their B-tree test seems to rely on 16 branches, and maybe the sweet spot is somewhere in between)

srcreigh · on April 28, 2023

That’s what I meant re keys, but ya nice point about pointers. Forgetting details of my data structures classes. updated my post (used to count 1 bit per node for pointers)

rocqua · on April 29, 2023

How does it help pre-fetching?