Hacker News new | past | comments | ask | show | jobs | submit login

It's been a riak-y day for me, so I watched this avidly.

You're no doubt wondering what the issues were; they come 20 minutes in, and were (paraphrased):

* At scale in production, adding a new node took days to complete all the handoffs; they recommend adding new nodes as soon as it's looking like you need them, rather than waiting until you're redlining.

* 2i is slow, especially in EC2; a straight KV "get" is milliseconds-denominated; 2i index queries were taking multiple seconds. Use 2i, they say, but in background processes.

* Javascript MapReduce is slow; this is well known. They confirm Erlang MR was adequate.

* As the LevelDB keyspace grows, there's a stepping function in latency; 5ms, then 15ms, then 25ms; the solution is to add nodes. (LevelDB is Google's KV store, a new option for Riak, required if you're using secondary indexes).

* Riak Control didn't work for them over low-latency connections.

* Once, a Chef misconfiguration left the whole cluster flapping on and off, which corrupted the cluster; they recovered with Basho support. Be careful about adding and removing nodes rapidly.

* Similarly, flapping a single node caused the cluster to get into a state where it wouldn't converge again; the cluster worked but no nodes could be added until they (presumably?) restarted it.




After watching the video, the takeaway for me was don't run $database on EC2 instances.

Last night I just finished replacing 6 maxed out medium instances with one $100 box from SoftLayer. :/


At that price point, it seems that you can get FAR better specs on Honelive. Kimsufi.ie is even cheaper, if you don't mind your server being hosted in the UK.

edit:

Softlayer: Intel 4x2.40GHz, 2 GB RAM ECC, $160/mo Honelive: Intel i7-2600 16, 16 GB RAM, $120/mo Kimsufi.ie: Intel i7 4x 2(HT)x 2.66+ GHz, 24 GB, $60/mo


Have you heard any reviews of Honelive? I've been looking for a dedicated server provider in the US with comparable prices to Hetzner in Germany.


I get a lot of nice little extras from SoftLayer. For example I can get boxes in the 5 corners of the internet (west, central, east, euro, asia) with free bandwidth between them.


Has anyone used Kimsufi? Are they any good? I'm at Hetzner right now, but I never found any use for that box, so I'm just paying for it for nothing...


I just signed up for kimsufi. They are the budget brand of OVH. Webhostingtalk gives them favorable reviews.


And if you do, for God's sake don't run them on EBS volumes. If you don't know by now that EBS performance is highly variable, you really haven't been paying attention.


Very useful data. Do you have anything more comprehensive about your experience?


I don't work at Kiip. :)


I thought you meant to say you had additional personal experience, but perhaps you've just been reading about Riak today.

Thanks anyway.


I've had a 3-node hardware Riak cluster lying around for about 4 months now, but only just today cut over from Postgres to it. I can talk to you about how shiny Riak is on day 1, but as this presentation shows, the day 1 behavior of a Riak cluster can be a bit of a trap.


Great to hear that you're cutting over to Riak. We highly recommend that your minimum cluster size is your N value (replication value) + 2. I your case that likely means 5 nodes for your starting cluster, not 3. There are many reasons. http://basho.com/blog/technical/2012/04/27/Why-Your-Riak-Clu...


Obviously great to know. Thanks!


Thomas, I'm in the middle of evaluating document stores and am perhaps getting stars in my eyes from Riak's easy-to-scale story. I'd love to hear more about your experiences once your cluster has had time to marinate a bit.


Interestingly enough, this is a very similar type list as you'd see for Mongo, especially in terms of overall effort.

That you'd encounter all these things at a 25mm daily ops level is pretty odd, though.


The very same folks did write about their MongoDB experience: http://blog.engineering.kiip.me/post/20988881092/a-year-with...


I think the point he was making was that the company in question never tried to scale horizontally with MongoDB because, in their words, "we believe horizontally scaling shouldn’t be necessary for the relatively small amount of ops per second we were sending to MongoDB."

Yet, they went and scaled horizontally with Riak and experienced pain.

Their opinion that it did not make sense to have to horizontally scale the "relatively small" number of ops they were sending to MongoDB is certainly their own, but then they horizontally scaled with Riak anyway and boasted about their 25MM ops per day scaling ... which, averaged out, is only about 280 ops per second.

In short, it was far from an apples to apples comparison.


It's probably significantly more than 280 ops/sec, given peak times.


"At scale in production, adding a new node took days to complete all the handoffs..."

That's a bit of a headscratcher. What is happening during those 'days' and what is the primary limiting factor?

I keep meaning to get into Riak, but then stuff like this where the system has crazy moments that are impossible to coherently reason about keep popping up.


It's not a crazy moment. They have a system running at real scale, and they found that while keeping the system up constantly with immense amounts of data in it, they were able to dynamically add a new node to their cluster --- just that balancing everything out took a lot of time for the system.

The operational challenge I infer from this is that they had waited to add that node until they really needed it, because their expectation was that adding the node would get them quick relief to their scaling issue. Instead, they got relief a few days later when the node was fully integrated.

Solution: don't wait to add nodes until the last minute.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: