Backbone.js and Capsule and Thoonk, oh my A scalable realtime architecture

rbranson · on Nov 17, 2011

How is this scalable? It seems like this is a single-instance solution that in theory could scale, but this isn't even proven, nor is a clear path given. The author throws out a few vague ideas, and seems to vastly underestimate the complexity of scaling out. Redis cluster also is nowhere near stable, and keeps getting pushed back, so how does this help anyone now?

fritzy · on Nov 17, 2011

This is not a single instance solution. We can run as many instances as we want across multiple servers. It is currently a single-Redis instance solution, however we have some intelligent sharding planned.

I agree with you about Redis clustering, and would extend it by saying I don't think it'd be helpful for this problem regardless. Intelligent sharding with some gossip for slave promotion is where we'll have to go.

See my follow-up post tomorrow.

rbranson · on Nov 17, 2011

Yes, single Redis, but multiple instances of the application backed with a single datastore is hardly revolutionary. This is the classic reason why people go with a stateless app layer, so that it's simple to scale at least a single tier of the architecture.

I'm still not sure how this lives up to the "scalable realtime architecture" that dominates the title. People have been building Redis-backed message brokers for a while now. It seems like you have some untested ideas for "intelligent sharding" that could possibly provide scalability, but that's punting on the hard problem. Without this, this is all really pretty pedestrian.

The gossiping sounds like an interesting lead, is this going to be discussed in the post tomorrow?

fritzy · on Nov 17, 2011

Evolutionary, and probably not even a unique idea. My post does go into the intelligent sharding a bit. Essentially sharding, in the case of &bang, on teams with a simple lookup to find which server a given team is on. Allowing the stateless app layer (as you rightfully call it, other than keeping track of subscription event routing) to connect to multiple Redis servers as needed. You could be on two teams on separate servers and it would work fine.

I also agree that people have been using Redis for messaging, however, what we're doing is a step past that. Again, probably others are doing this. We have atomic verbs for dealing with higher level objects in Redis that use Redis-PubSub for communicating the changes to the objects. This way, whenever an object (feed) is changed (publish/edit/delete/reposition), it can bubble up to the user or processes that care. These feeds are broken down by not only topic but subscribe-able units of interest as well. In this way, processes, users, etc only get updates on data that is relevant to them.

I'm working on converting these verbs to Redis-Lua scripts as my tests have shown it decreases CPU time and reduces atomicity code (especially in Node.js where watch->multi->exec callback stacks can be interrupted by other events). I also expect it to make supporting Thoonk in multiple languages easier as the core code will be shared.

The gossiping is currently being discussed in the Redis mailing list and may end up taking a similar approach to Hadoop's intelligent clients. Antirez would like to provide some tools to make this easier as others have rolled their own Redis gossiping. I'll be researching that more after implementing the team sharding. HA, while related to scaling, is a bit off topic, and I admit to hand-waving here. For now we're focusing more on not losing/corrupting user data.

Perhaps Henrik's title is hyperbole. I doubt the approaches that we're taking are revolutionary, and I imagine similar things been done before. However, we believe that it is a good approach, and we believe in the direction we're taking enough to share it.

Thanks for the intelligent discussion.

rbranson · on Nov 17, 2011

Cool, thanks for taking the time to go a bit more in depth on this. This is a topic I'm particularly interested in, so I'm looking forward to the follow-up.

fritzy · on Nov 17, 2011

To clarify, by "Intelligent Sharding" I meant application specific sharding to keep objects together that are related to each other.

jeromeparadis · on Nov 17, 2011

Looks quite promising. I've played a lot with node.js, Redis and pub/sub wrappers around socket.io and I came to the conclusion that it's this kind of architecture that does the job of scaling well horizontally until Redis becomes the bottleneck.

I'll dig in the source code as I'm very interested in how it's implemented.

fritzy · on Nov 17, 2011

My post for tomorrow as a quick section on scaling beyond a single instance of Redis.

Essentially intelligent sharding (in &bang's case, by team). Each node.js (or other processes) can look-up which Redis server-set owns a team, and can probably just stay connected to most all of the Redis instances. For HA, slave each shard, AOB, and off-server backups every 15m. Along with a gossip protocol for giving up on masters and promoting slaves.

According to our tests, we don't have to worry about it for awhile, but we've got a plan regardless and will start implementing it soon.

fritzy · on Nov 17, 2011

I'm working on a post for tomorrow that goes into the details of making a feed-driven single-page app and a bit of the philosophy of our design choices and Thoonk itself.

collint · on Nov 17, 2011

I don't see anything to deal with conflicts and operations crossing each other on the wire. The sort of thing that Operational Transformation deals with.

ajessup · on Nov 17, 2011

Depends what you're trying to keep concurrent, and your concurrency policy. OT is great for optimistic concurrency on complex structured documents, but probably overkill for simple models (like to-do items) with simple attributes (like title, is-done etc.). For that you could get away with - eg. a revision counter on the model.

Because of the performance implications of different concurrency, you probably don't want it baked too deeply into any framework, since there's some stuff you definitely want strict conflict/error resolution on, and other things you don't.

fritzy · on Nov 17, 2011

Essentially, that's our take as well. However, we do have a plan for dealing with conflicts and missed updates soon -- probably after Thoonk.js 1.0 slated for late December. Here's how I replied to this question on the blog comments:

The data in the thoonk feeds are never edited locally on the client. Any user-actions that change data go out as a websocket rpc call, and get placed in a job queue. The workers validate the data, check ACL, and then update the corresponding feeds, which then bubbles back up to the user. This happens nearly instantly, and the feed updates are atomic. If two users edit an object, then the last edit wins. For the data we have, this isn't a problem. However, we've got a plan for dealing with concurrency and conflict resolution in the future so that we can can handle being offline for periods of time, and detecting and dealing with conflicts. --

For &bang in particular, we'll probably just let the user that owns the task resolve the conflicts. "You queued an update to this task while offline, but Bob edited as well. Which version would you like to keep?" In general, you can't edit eachother's data, but you can add to it, so it isn't much of a problem.

Thoonk 1.0 will have update history and incrementing revision numbers for feeds that will give you enough information to resolve your conflicts (whatever the method you choose is) and will help your app recognize and retrieve missed updates.