Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

(For the record, the reason why I have stared at this as much as I have just now is because I study these kinds of data structures and therefore want to understand what is new and compelling about this specific one.)

So, tries have the property of being only as balanced as the distribution of their alphabets. It is very easy to end up in a situation where worst case performance of a trie is linear in the number of items in the trie. There are some researchers who were doing work on a concept called "stratified indexes" that allow you to index into deeply skew'd tries that you may find interesting.

Regardless, the claims in this article are somewhat off. For example: they state that red-black trees "write out a lot of memory". Any full binary tree with N leaf nodes has N-1 non-leaf nodes. That statement is just as true of red-black trees as it is of this data structure. It is definitely true that the STL red-black tree implementation is inefficient (allocating memory for empty nodes), but that should not be held against the algorithm.

The documentation is even weirder. "Unlike either red-black trees or most hash tables, nedtries can store as many items with identical keys as you like." That is certainly not true, and isn't even true of the STL, which has a red-black tree without unique restriction called std::multimap.

The documentation also claims that hashtables can store arbitrarily large keys and nedtries can't, which is only as true of hashtables as it is for nedtries. As hashtables aren't ordered, and in fact require fixed-length keys, you can (and do: normally this statement would be going in the other direction of causality) use hash algorithms to hash the information to fit into the key length and then verify the value when you find it. The same thing could be used for nedtries.

I'm also having a difficult time understanding why red-black trees are not an "in place algorithm" and nedtries are. I'm not even certain I understand what it would mean for a search tree to be "in place", but the documentation makes claims about "dynamic memory allocation" being its key distinction, and I see no reason why a red-black tree need require that for search or deletion. Insert requires a single allocation, but so does a trie; and, even with the STL most people use a custom pool allocator for that purpose.

To be clear, I am not at all stating that "nedtries aren't faster than other implementations", just that that has nothing to do with their algorithm. The algorithm seems to just be an array of tries, where the array is designed to help with the common case of "there are a large number of zeros at the top of the key" (caused by using small numbers as keys).

Unfortunately, that makes the distribution among the bin-level tries drastically uneven, making that first "which bucket do I end up in" question severely unfairly unbalanced.

That specific optimization, then, doesn't seem to offer anything over a patricia trie, which itself would dramatically decrease the number of non-leaf node traversals (as nedtries does not describe using any kind of path compression at all). These path-compressed tries are what people normally use for high-performance lookup like routing.



I agree that his prose is a little confusing. Here's my understanding of his intent:

"red-black trees write out a lot of memory"

I think this refers to how R-B trees require re balancing on insert, particularly accessing siblings of ancestors. This is doubly bad on modern processors, as it introduces a chain of control hazards dependent on these accesses. That can burn a lot of cycles on cache misses. In contrast, the trie structures only access nodes along the direct line down to the appropriate leaf. This has a control hazard (ie when to end the loop) and a data dependency between iterations. The control hazard can be easily predicted (assume search recurses) and modern processors are quite good at unrolling the data hazards across the shadow registers (or equivalent). Basically, modern processors and compilers are much happier with the sort of inner loops you see in tries vs restructuring trees (and are happiest with the inner loop of accessing a hash table).

"The documentation also claims that hashtables can store arbitrarily large keys and nedtries can't, which is only as true of hashtables as it is for nedtries."

I believe his implementation assumes size_t sized keys. This limit comes from his root table, which needs as many entries as bits in the key. While you could code up a variant that used arbitrary length keys I think you'd lose most of the performance advantages he measures (remember, we're mostly talking about constant factors here, not complexity class). He doesn't care about this limit, as his primary application is storing free lists in an allocator, which will always have size_t keys.

"I'm also having a difficult time understanding why red-black trees are not an "in place algorithm" and nedtries are."

I'm also confused by this, but I'm assuming he's looking at concurrent access. Tries only ever push down toward leaves, so with careful ordering it's possible to ignore many locking concerns. Or put a different way, you're only ever allocating new nodes or filling in previously nil slots in existing nodes. Trees with structural re balancing on insert like R-B could invalidate a concurrent traversal with a rotation.

"That specific optimization, then, doesn't seem to offer anything over a patricia trie, which itself would dramatically decrease the number of non-leaf node traversals (as nedtries does not describe using any kind of path compression at all)."

I think for his purposes there are some assumptions about distribution: namely if he's storing memory addresses in a nedtrie, it's likely he's allocating them with brk() or the like, and so they're clustered in one or two slots of the root table. Likewise, if he's storing block sizes, then the root table acts as a segregated fit similar to dlmalloc and its many offspring.

I'd agree that the way he over generalizes the advantages is a bit confusing, but I do think there's something interesting here vs vanilla patricia tries.


Re: vs. hash tables

Both nedtries (as specified / implemented) and hash tables use fixed length keys. Either can be implemented with an arbitrary fixed length, but both are fixed.

The reason you can store arbitrary length keys in hash tables is only because you hash the key to a fixed length and then specify some form of collision resolution.

This hashing means, of course, that you lose order (disallowing what this author calls "nearest fit" searches). But, if you are considering using hash tables for your use case, you already don't care about that feature.

In the case of nedtries, you don't even need to do anything fancy (linear probing) or use a secondary data structure (chaining): as nedtries already allow you to store the same value twice (so he said... I didn't pay attention to how; worst case you can use chaining ;P) you can make the same tradeoff.

I guess you could argue this is an unrelated data structure, a "hash nedtrie", but if you are comparing the overall abilities of these data structure concepts it seems unfair to not point this out, especially given that it doesn't require changing his implementation (the user can do the disambiguation, as they may even be used to doing anyway with hash tables).


Re: assumptions about distribution

But if the assumption is that they are all using a couple slots of that root table, then the root table is not a performance advantage. It would be faster to just start the trie immediately and switch off the first bit.

The point of having a structure like that at the top is that it is a cheap-o form of level compression, and increases the fan-out of the root node, allowing you to cut the search space down much faster given a single very cheap lookup.

As for segregated fit, the bucket array itself is probably actually good enough for that. What /could/ make that interesting is if you /also/ stated that "small" (many leading zero) keys happened to be much more prevalent, and in fact twice as prevalent, as those in the next bucket up.

Otherwise, it is just further unbalancing his trie and is getting reasonably the same tradeoff as if he were using a patricia trie (which will use path compression to eat large numbers of 0s and otherwise will be reasonably as unbalanced without the cost of the bucket array, which is going to be many internal nodes worth of memory).


In fact, it is even worse than that: in the case of random keys (which he admits in the documentation is important for trie balance without helping the user obtain them) he'd be much much better off just chopping the first five bits off the key and using that to index into his top-level array.

Doing that would let the top-level array segment the search space perfectly into 32 sub-buckets, avoiding the fact that the higher order array elements are increasingly meaningless (as they are exponentially smaller).

At that point, however, the datastructure is now a level-compressed trie ;P. (LPC tries seem to be at the state of the art for this kind of structure).


Re: vs. red-black trees

I don't know... he keeps talking about dynamic memory allocation and total memory usage. If he means cache locality, he should just say cache locality.

Regardless, if you care about cache performance and dynamic allocation, as the goal of a red-black tree is to be reasonably balanced, you can just treat it as a complete tree and use standard binary tree packing on it (the same algorithm normally used for heaps; maybe allocate each level separately, as he is also concerned with the cost of reallocation).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: