(For the record, the reason why I have stared at this as much as I have just now...

jasonwatkinspdx · on Jan 16, 2011

I agree that his prose is a little confusing. Here's my understanding of his intent:

"red-black trees write out a lot of memory"

I think this refers to how R-B trees require re balancing on insert, particularly accessing siblings of ancestors. This is doubly bad on modern processors, as it introduces a chain of control hazards dependent on these accesses. That can burn a lot of cycles on cache misses. In contrast, the trie structures only access nodes along the direct line down to the appropriate leaf. This has a control hazard (ie when to end the loop) and a data dependency between iterations. The control hazard can be easily predicted (assume search recurses) and modern processors are quite good at unrolling the data hazards across the shadow registers (or equivalent). Basically, modern processors and compilers are much happier with the sort of inner loops you see in tries vs restructuring trees (and are happiest with the inner loop of accessing a hash table).

"The documentation also claims that hashtables can store arbitrarily large keys and nedtries can't, which is only as true of hashtables as it is for nedtries."

I believe his implementation assumes size_t sized keys. This limit comes from his root table, which needs as many entries as bits in the key. While you could code up a variant that used arbitrary length keys I think you'd lose most of the performance advantages he measures (remember, we're mostly talking about constant factors here, not complexity class). He doesn't care about this limit, as his primary application is storing free lists in an allocator, which will always have size_t keys.

"I'm also having a difficult time understanding why red-black trees are not an "in place algorithm" and nedtries are."

I'm also confused by this, but I'm assuming he's looking at concurrent access. Tries only ever push down toward leaves, so with careful ordering it's possible to ignore many locking concerns. Or put a different way, you're only ever allocating new nodes or filling in previously nil slots in existing nodes. Trees with structural re balancing on insert like R-B could invalidate a concurrent traversal with a rotation.

"That specific optimization, then, doesn't seem to offer anything over a patricia trie, which itself would dramatically decrease the number of non-leaf node traversals (as nedtries does not describe using any kind of path compression at all)."

I think for his purposes there are some assumptions about distribution: namely if he's storing memory addresses in a nedtrie, it's likely he's allocating them with brk() or the like, and so they're clustered in one or two slots of the root table. Likewise, if he's storing block sizes, then the root table acts as a segregated fit similar to dlmalloc and its many offspring.

I'd agree that the way he over generalizes the advantages is a bit confusing, but I do think there's something interesting here vs vanilla patricia tries.

saurik · on Jan 16, 2011

Re: vs. hash tables

Both nedtries (as specified / implemented) and hash tables use fixed length keys. Either can be implemented with an arbitrary fixed length, but both are fixed.

The reason you can store arbitrary length keys in hash tables is only because you hash the key to a fixed length and then specify some form of collision resolution.

This hashing means, of course, that you lose order (disallowing what this author calls "nearest fit" searches). But, if you are considering using hash tables for your use case, you already don't care about that feature.

In the case of nedtries, you don't even need to do anything fancy (linear probing) or use a secondary data structure (chaining): as nedtries already allow you to store the same value twice (so he said... I didn't pay attention to how; worst case you can use chaining ;P) you can make the same tradeoff.

I guess you could argue this is an unrelated data structure, a "hash nedtrie", but if you are comparing the overall abilities of these data structure concepts it seems unfair to not point this out, especially given that it doesn't require changing his implementation (the user can do the disambiguation, as they may even be used to doing anyway with hash tables).

saurik · on Jan 16, 2011

Re: assumptions about distribution

But if the assumption is that they are all using a couple slots of that root table, then the root table is not a performance advantage. It would be faster to just start the trie immediately and switch off the first bit.

The point of having a structure like that at the top is that it is a cheap-o form of level compression, and increases the fan-out of the root node, allowing you to cut the search space down much faster given a single very cheap lookup.

As for segregated fit, the bucket array itself is probably actually good enough for that. What /could/ make that interesting is if you /also/ stated that "small" (many leading zero) keys happened to be much more prevalent, and in fact twice as prevalent, as those in the next bucket up.

Otherwise, it is just further unbalancing his trie and is getting reasonably the same tradeoff as if he were using a patricia trie (which will use path compression to eat large numbers of 0s and otherwise will be reasonably as unbalanced without the cost of the bucket array, which is going to be many internal nodes worth of memory).

saurik · on Jan 16, 2011

In fact, it is even worse than that: in the case of random keys (which he admits in the documentation is important for trie balance without helping the user obtain them) he'd be much much better off just chopping the first five bits off the key and using that to index into his top-level array.

Doing that would let the top-level array segment the search space perfectly into 32 sub-buckets, avoiding the fact that the higher order array elements are increasingly meaningless (as they are exponentially smaller).

At that point, however, the datastructure is now a level-compressed trie ;P. (LPC tries seem to be at the state of the art for this kind of structure).

saurik · on Jan 16, 2011

Re: vs. red-black trees

I don't know... he keeps talking about dynamic memory allocation and total memory usage. If he means cache locality, he should just say cache locality.

Regardless, if you care about cache performance and dynamic allocation, as the goal of a red-black tree is to be reasonably balanced, you can just treat it as a complete tree and use standard binary tree packing on it (the same algorithm normally used for heaps; maybe allocate each level separately, as he is also concerned with the cost of reallocation).