What the hell does "probabilistically balanced" mean?

moomin · 2025-07-01T08:01:22 1751356882

It means the balancing is content-dependent, and is normally balanced, but certain edge-case inputs may result in sub-optimal behaviour.

zombot · 2025-07-01T08:50:27 1751359827

What is probabilistic about that? It sounds deterministic.

judofyr · 2025-07-01T09:18:29 1751361509

The opposite of probabilistic is not deterministic in this context. This is not about «drawing a random number», but rather that balancing is dependent on the input data. «With high probability» here means «majority of the possible input data leads to a balanced structure».

If it was not probabilistic then the balancing would be guaranteed in all cases. This typically means that it somehow stores balancing information somewhere so that it can detect when something is unbalanced and repair it. In this data structure we’re just hashing the content without really caring about the current balance and then it turns out that for most inputs it will be fine.

mcherm · 2025-07-01T09:10:54 1751361054

The specific behavior of the hash function (including the salt you chose). Choosing a different hash function (or a different salt) would result in a different breakdown into chunks.

In principle, if your data were specially crafted to exploit the specific hash function (and salt) you could get an aberrant case like 1 million entries in a single b-tree node or a million b-tree nodes with just one entry. But unless you intentionally exploit the hash function the chance of this is vanishingly small.

stonemetal12 · 2025-07-01T13:18:52 1751375932

Like quicksort is O(N log N) on average but can degrade to O(N^2) in the worse case. The tree is balanced on average, but can degrade to not close to balanced in the worst case.

moomin · 2025-07-01T09:11:06 1751361066

It's a term of art. We say the same thing about quicksort being "usually" O(n log n)