Without considering cache-miss, Red-black tree is much faster than B-tree. So this b-tree implementation must be tailored to cache-line size, and it may performs badly on some CPUs.
The performance gain on the B-tree container is also largely depends on the size of value. So choose this one carefully.
> So this b-tree implementation must be tailored to cache-line size, and it may performs badly on some CPUs.
Um, no, Btrees are always going to win here as long as key size is relatively small compared to the number of items in the structure. Btrees, for small keyed associations, have a much better "branching factor per cache line." This means that fewer cache misses are needed to find relevant items (insert, delete, read, …), because the trees are shallower, and the cost of a cache miss on a btree node isn't much more expensive than an R-B node.
Benchmark everything and tune your B-Tree container size appropriately, definitely. You could even instantiate the template with multiple different sizes at compile time and benchmark them at run-time, then elect the most efficient implementation for the hardware currently being used.
The performance gain on the B-tree container is also largely depends on the size of value. So choose this one carefully.