It's been a riak-y day for me, so I watched this avidly. You're no doubt wonderi...

dsl · on May 24, 2012

After watching the video, the takeaway for me was don't run $database on EC2 instances.

Last night I just finished replacing 6 maxed out medium instances with one $100 box from SoftLayer. :/

bravura · on May 25, 2012

At that price point, it seems that you can get FAR better specs on Honelive. Kimsufi.ie is even cheaper, if you don't mind your server being hosted in the UK.

edit:

Softlayer: Intel 4x2.40GHz, 2 GB RAM ECC, $160/mo Honelive: Intel i7-2600 16, 16 GB RAM, $120/mo Kimsufi.ie: Intel i7 4x 2(HT)x 2.66+ GHz, 24 GB, $60/mo

moonboots · on May 25, 2012

Have you heard any reviews of Honelive? I've been looking for a dedicated server provider in the US with comparable prices to Hetzner in Germany.

dsl · on May 25, 2012

I get a lot of nice little extras from SoftLayer. For example I can get boxes in the 5 corners of the internet (west, central, east, euro, asia) with free bandwidth between them.

stavros · on May 25, 2012

Has anyone used Kimsufi? Are they any good? I'm at Hetzner right now, but I never found any use for that box, so I'm just paying for it for nothing...

bravura · on May 25, 2012

I just signed up for kimsufi. They are the budget brand of OVH. Webhostingtalk gives them favorable reviews.

spudlyo · on May 24, 2012

And if you do, for God's sake don't run them on EBS volumes. If you don't know by now that EBS performance is highly variable, you really haven't been paying attention.

j2labs · on May 24, 2012

Very useful data. Do you have anything more comprehensive about your experience?

tptacek · on May 24, 2012

I don't work at Kiip. :)

j2labs · on May 24, 2012

I thought you meant to say you had additional personal experience, but perhaps you've just been reading about Riak today.

Thanks anyway.

tptacek · on May 25, 2012

I've had a 3-node hardware Riak cluster lying around for about 4 months now, but only just today cut over from Postgres to it. I can talk to you about how shiny Riak is on day 1, but as this presentation shows, the day 1 behavior of a Riak cluster can be a bit of a trap.

gregburd · on May 25, 2012

Great to hear that you're cutting over to Riak. We highly recommend that your minimum cluster size is your N value (replication value) + 2. I your case that likely means 5 nodes for your starting cluster, not 3. There are many reasons. http://basho.com/blog/technical/2012/04/27/Why-Your-Riak-Clu...

tptacek · on May 25, 2012

Obviously great to know. Thanks!

saturdayplace · on May 25, 2012

Thomas, I'm in the middle of evaluating document stores and am perhaps getting stars in my eyes from Riak's easy-to-scale story. I'd love to hear more about your experiences once your cluster has had time to marinate a bit.

mrkurt · on May 25, 2012

Interestingly enough, this is a very similar type list as you'd see for Mongo, especially in terms of overall effort.

That you'd encounter all these things at a 25mm daily ops level is pretty odd, though.

rogerbinns · on May 25, 2012

The very same folks did write about their MongoDB experience: http://blog.engineering.kiip.me/post/20988881092/a-year-with...

jasonmccay · on May 25, 2012

I think the point he was making was that the company in question never tried to scale horizontally with MongoDB because, in their words, "we believe horizontally scaling shouldn’t be necessary for the relatively small amount of ops per second we were sending to MongoDB."

Yet, they went and scaled horizontally with Riak and experienced pain.

Their opinion that it did not make sense to have to horizontally scale the "relatively small" number of ops they were sending to MongoDB is certainly their own, but then they horizontally scaled with Riak anyway and boasted about their 25MM ops per day scaling ... which, averaged out, is only about 280 ops per second.

In short, it was far from an apples to apples comparison.

tptacek · on May 25, 2012

It's probably significantly more than 280 ops/sec, given peak times.

Cloven · on May 25, 2012

"At scale in production, adding a new node took days to complete all the handoffs..."

That's a bit of a headscratcher. What is happening during those 'days' and what is the primary limiting factor?

I keep meaning to get into Riak, but then stuff like this where the system has crazy moments that are impossible to coherently reason about keep popping up.

tptacek · on May 26, 2012

It's not a crazy moment. They have a system running at real scale, and they found that while keeping the system up constantly with immense amounts of data in it, they were able to dynamically add a new node to their cluster --- just that balancing everything out took a lot of time for the system.

The operational challenge I infer from this is that they had waited to add that node until they really needed it, because their expectation was that adding the node would get them quick relief to their scaling issue. Instead, they got relief a few days later when the node was fully integrated.

Solution: don't wait to add nodes until the last minute.